Discovering and documenting brilliance: A novel multimodal annotation method

Alexander Murzaku; Pontish Yeramyan; Curt Anderson; Steven Buxbaum; Ruben Diaz; Marielle Lerner; Armenui Minasyan; Hazel Mitchley; Jodie-Ann Pennant; Mia Shang; Brisa Speier-Brito

Authors

Alexander Murzaku Saint Elizabeth University
Pontish Yeramyan Genius Institute at Gap International
Curt Anderson Genius Institute at Gap International
Steven Buxbaum Genius Institute at Gap International
Ruben Diaz Genius Institute at Gap International
Marielle Lerner Genius Institute at Gap International
Armenui Minasyan Genius Institute at Gap International
Hazel Mitchley Genius Institute at Gap International
Jodie-Ann Pennant Genius Institute at Gap International
Mia Shang Genius Institute at Gap International
Brisa Speier-Brito Genius Institute at Gap International

Keywords:

Multimodal Annotation, Tacit Knowledge, Thematic Analysis, Corpus Analysis, Human Brilliance

Abstract

The pursuit of understanding human brilliance has long fascinated scholars, practitioners, and observers across various domains of performance. Capturing and understanding human brilliance is an important endeavor, as it allows us to gain insight into the motivations behind extraordinary achievements. However, brilliance remains a complex phenomenon that eludes simple characterization. In this paper we propose a methodology for investigating brilliance. We use structured interviews with high-achieving individuals to document the language surrounding their accomplishments, and code this language using seven distinct themes. Each interview is then annotated using a multimodal annotation schema that captures a variety of linguistic and paralinguistic features, including phonetic information, hand gesture, and eye gaze. This system allows us to discover and document the tacit knowledge that underlies human brilliance and make inroads in understanding the full communicative expression of brilliance.

References

Abuczki, Ágnes, and Esfandiari Baiat Ghazaleh. 2013. “An overview of multimodal corpora, annotation tools and schemes.” Argumentum 9(1): 86-98. https://core.ac.uk/download/pdf/161005866.pdf.

Bain, Max, Jaesung Huh, Tengda Han, and Andrew Zisserman. 2023. “WhisperX: Time-accurate speech transcription of long-form audio.” In Proceedings of Interspeech 2023, 4489-93. https://doi.org/10.21437/Interspeech.2023-78.

Boersma, Paul, and Weenink, David. 2024. Praat: doing phonetics by computer [Computer program]. V6.1.09. http://www.praat.org/.

Braun, Nadine, Martijn Goudbeek, and Emiel Krahmer. 2020. “Emotional Words-The Relationship of Self-and Other-Annotation of Affect in Written Text.” Cognitive Science: 430-36.

Bryant, Gregory A., and Jean E. Fox Tree. 2002. “Recognizing verbal irony in spontaneous speech.” Metaphor and Symbol 17(2): 99-117. https://doi.org/10.1207/S15327868MS1702_2.

Carletta, Jean, Simone Ashby, Sebastien Bourban, Mike Flynn, Mael Guillemot, Thomas Hain, Jaroslav Kadlec, et al. 2005. “The AMI meeting corpus: A pre-announcement.” In International workshop on machine learning for multimodal interaction: 28-39. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/11677482_3.

Casonato, R., and K. Harris. 1999. “Can an Enterprise Really Capture ‘Tacit Knowledge’: We answer two top questions on knowledge management from the Electronic Workplace 1999 Conference.” Gartner Group Research Note Select Q&A 16.

Church, R. Breckinridge, Spencer Kelly, and David Holcombe. 2013. “Temporal synchrony between speech, action and gesture during language production.” Language, Cognition and Neuroscience 29(3): 345-54. https://doi.org/10.1080/01690965.2013.857783.

Clark, Herbert H. 1996. Using language. Cambridge University Press. https://doi.org/10.2277/0521561582.

Clark, Herbert H., and Jean E. Fox Tree. 2002. “Using uh and um in spontaneous speaking.” Cognition 84(1): 73-111. https://doi.org/10.1016/S0010-0277(02)00017-3.

Cohen, Jacob. 1960. “A coefficient of agreement for nominal scales.” Educational and psychological measurement 20(1): 37-46. http://dx.doi.org/10.1177/001316446002000104.

Cossavella, Francisco, and Jazmín Cevasco. 2021. “The importance of studying the role of filled pauses in the construction of a coherent representation of spontaneous spoken discourse.” Journal of Cognitive Psychology 33(2): 172-86. https://doi.org/10.1080/20445911.2021.1893325.

Dai, Keshi, Harriet Fell, and Joel MacAuslan. 2009. “Comparing Emotions Using Acoustics and Human Perceptual Dimensions.” In CHI ’09 Extended Abstracts on Human Factors in Computing Systems CHI EA ’09: 3341-46. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/1520340.1520483.

Dampney, Kit, Peter Busch, and Debbie Richards. 2002. “The meaning of tacit knowledge.” Australasian Journal of Information Systems 10(1): 3-13. https://doi.org/10.3127/ajis.v10i1.438.

Dara, Chinar, Laura Monetta, and Marc D. Pell. 2008. “Vocal emotion processing in Parkinson’s disease: Reduced sensitivity to negative emotions.” Brain Research 1188: 100-11. https://doi.org/10.1016/j.brainres.2007.10.034.

Drijvers, Linda, and Judith Holler. 2023. “The multimodal facilitation effect in human communication.” Psychonomic Bulletin & Review 30: 792-801. https://doi.org/10.3758/s13423-022-02178-x.

Erickson, Donna, Chunyue Zhu, Shigeto Kawahara, and Atsuo Suemitsu. 2016. “Articulation, Acoustics and Perception of Mandarin Chinese Emotional Speech.” Open Linguistics 2(1): 620-35. https://doi.org/10.1515/opli-2016-0034.

Gandhi, Ankita, Kinjal Adhvaryu, Soujanya Poria, Erik Cambria, and Amir Hussain. 2023. “Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions.” Information Fusion 91: 424-44. https://doi.org/10.1016/j.inffus.2022.09.025.

Gap International. 2021. “The Leveraging Genius Conference 2021.” Online, February 22-24, 2021, https://m.eventsinamerica.com/events/the-leveraging-genius-conference-2021/retail-wholesale/ecommerce/5obagztduc3696mz.

Goldin-Meadow, Susan. 2003. Hearing gesture: How our hands help us think. Harvard University Press.

Goldin-Meadow, Susan. 2014. “Widening the lens: What the manual modality reveals about language, learning and cognition.” Philosophical Transaction of the Royal Society B 369: 20130295. https://doi.org/10.1098/rstb.2013.0295.

Graham, Jean Ann, and Michael Argyle. 1975. “A cross-cultural study of the communication of extra-verbal meaning by gestures.” International Journal of Psychology 10(1): 57-67. https://doi.org/10.1080/00207597508247319.

Guarasci, Roberto, Antonietta Folino, Alessia Cosentino, Elena Cardillo, and Maria Taverniti. 2008. “Gestion et formalisation de la connaissance tacite. ” In Sistemes d’Information & Intelligence economique 1: 564-76. IHE éditions.

Guarasci, Roberto, Elena Cardillo, Antonietta Folino, and Maria Taverniti. 2010. “Multilingual taxonomic and terminological structures of a domain.” In Globalization and the Management of Information Resources - Papers from the International Conference. Sofia, Bulgaria, 12-14 November 2008, 508-18. St. Kliment Ohridski University Press.

Hirschberg, Julia Bell, and Andrew Rosenberg. 2005. “Acoustic/Prosodic and Lexical Correlates of Charismatic Speech.” In Proceedings of Eurospeech 2005. https://doi.org/10.7916/D8Q52Z3Z.

Holler, Judith, and Stephen C. Levinson. 2019. “Multimodal language processing in human communication.” Trends in Cognitive Sciences 23(8): 639-52. https://doi.org/10.1016/j.tics.2019.05.006.

Kacur, Juraj, Boris Puterka, Jarmila Pavlovicova, and Milos Oravec. 2021. “On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition.” Sensors 21(5): 1888. https://doi.org/10.3390/s21051888.

Kamiloğlu, Roza G., Agneta H. Fischer, and Disa A. Sauter. 2020. “Good Vibrations: A Review of Vocal Expressions of Positive Emotions.” Psychonomic Bulletin & Review 27(2): 237–65. https://doi.org/10.3758/s13423-019-01701-x.

Kelly, Spencer D., and Quang-Anh Ngo Tran. 2023. “Exploring the emotional functions of co-speech hand gesture in language and communication.” Topics in Cognitive Science 00: 1-23. https://doi.org/10.1111/tops.12657.

Kelly, Spencer D., Dale J. Barr, Ruth Breckinridge Church, and Katheryn Lynch. 1999. “Offering a hand to pragmatic understanding: The role of speech and gesture in comprehension and memory.” Journal of Memory and Language 40: 577-92. https://doi.org/10.1006/jmla.1999.2634.

Kendon, Adam. 2004. Gesture: Visible action as utterance. Cambridge University Press.

Kutas, Marta, and Kara D. Federmeier. 2011. “Thirty years and counting: Finding meaning in the N400 component of the event-related brain potential (ERP).” Annual Review of Psychology 62: 621-47. https://doi.org/10.1146/annurev.psych.093008.131123.

Landis, J. Richard, and Gary G. Koch. 1977. “The measurement of observer agreement for categorical data.” Biometrics: 159-74. http://dx.doi.org/10.2307/2529310.

Larrouy-Maestri, Pauline, David Poeppel, and Marc D. Pell. 2024. “The sound of emotional prosody: Nearly 3 decades of research and future directions.” Perspectives on Psychological Science. https://doi.org/10.1177/17456916231217722.

Lascarides, Alex, and Matthew Stone. 2009. “Discourse coherence and gesture interpretation.” Gesture 9(2): 147-80. https://doi.org/10.1075/gest.9.2.01las.

Levy, Rachel S., and Spencer D. Kelly. 2020. “Emotion matters: The effect of hand gesture on emotionally valenced sentences.” Gesture 19(1): 41-71. https://doi.org/10.1075/gest.19029.lev.

Loehr, Daniel. 2007. “Aspects of rhythm in gesture and speech.” Gesture 7(2): 179-214. https://doi.org/10.1075/gest.7.2.04loe.

MacDonald, John, and Harry McGurk. 1978. “Visual influences on speech perception processes.” Perception and Psychophysics 24(3): 253-57. https://doi.org/10.3758/BF03206096.

Massaro, Dominic W. 1987. Speech perception by ear and eye: A paradigm for psychological inquiry. Hillsdale, New Jersey: Lawrence Erlbaum.

McAuliffe, Michael, Michaela Socolof, Sarah Mihuc, Michael Wagner, and Morgan Sonderegger. 2017. “Montreal Forced Aligner: trainable text-speech alignment using Kaldi.” In Proceedings of the 18th Conference of the International Speech Communication Association: 498-502. https://doi.org/10.21437/Interspeech.2017-1386.

McGurk, Harry, and John MacDonald. 1976. “Hearing lips and seeing voices.” Nature 264: 746-48. https://doi.org/10.1038/264746a0.

McNeill, David. 1992. Hand and mind: What gestures reveal about thought. University of Chicago Press.

McNeill, David. 2005. Gesture and thought. University of Chicago Press. https://doi.org/10.7208/chicago/9780226514642.001.0001.

Meilán, Juan José G., Francisco Martínez-Sánchez, Juan Carro, Dolores E. López, Lymarie Millian-Morell, and José M. Arana. 2014. “Speech in Alzheimer’s Disease: Can Temporal and Acoustic Parameters Discriminate Dementia?” Dementia and Geriatric Cognitive Disorders; Basel 37(5-6): 327-34. https://doi.org/10.1159/000356726.

Monetta, Laura, Henry S. Cheang, and Marc D. Pell. 2008. “Understanding speaker attitudes from prosody by adults with Parkinson’s disease.” Journal of Neuropsychology 2: 415-30. https://doi.org/10.1348/174866407X216675.

Miton, Helena, and Simon DeDeo. 2022. “The cultural transmission of tacit knowledge.” Journal of the Royal Society Interface 19(195): 20220238. https://doi.org/10.1098/rsif.2022.0238.

Olomolaiye, Anthony, and Charles Egbu. 2005. “Tacit vs. explicit knowledge - the current approaches to knowledge management.” In Second Scottish Conference for Postgraduate Researchers of the Built and Natural Environment (Probe): 503–11. https://www.irbnet.de/daten/iconda/CIB10682.pdf.

Pell, Marc D. 2008. “Cerebral mechanisms for understanding emotional prosody in speech.” Brain and Language 96(2): 221-34. https://doi.org/10.1016/j.bandl.2005.04.007.

Perniss, Pamela. 2018. “Why we should study multimodal language.” Frontiers in Psychology 9. https://doi.org/10.3389/fpsyg.2018.01109.

Polanyi, Michael. 1958. Personal Knowledge: Towards a Post - Critical Philosophy. London: Routledge and Kegan Paul.

Polanyi, Michael. 1966. The Tacit Dimension. Garden City, NY: Doubleday & Company Inc.

Rektorova, Irena, Jiri Mekyska, Eva Janousova, Milena Kostalova, Ilona Eliasova, Martina Mrackova, Dagmar Berankova, Tereza Necasova, Zdenek Smekal, and Radek Marecek. 2016. “Speech Prosody Impairment Predicts Cognitive Decline in Parkinson’s Disease.” Parkinsonism & Related Disorders no. 29 (August): 90-95. https://doi.org/10.1016/j.parkreldis.2016.05.018.

Roark, Brian, Margaret Mitchell, John-Paul Hosom, Kristy Hollingshead, and Jeffrey Kaye. 2011. “Spoken Language Derived Measures for Detecting Mild Cognitive Impairment.” In IEEE Transactions on Audio, Speech, and Language Processing 19(7): 2081–90. https://doi.org/10.1109/TASL.2011.2112351.

Sara, Jaskanwal Deep Singh, Elad Maor, Barry Borlaug, Bradley R. Lewis, Diana Orbelo, Lliach O. Lerman, and Amir Lerman. 2020. “Non-Invasive Vocal Biomarker Is Associated with Pulmonary Hypertension.” PLOS ONE 15(4): e0231441. https://doi.org/10.1371/journal.pone.0231441.

Schiel, Florian, Silke Steininger, and Ulrich Türk. 2002. “The SmartKom Multimodal Corpus at BAS.” In Proceedings of the International Language Resources and Evaluation Conference (LREC): 200-06. http://www.lrec-conf.org/proceedings/lrec2002/pdf/49.pdf.

Simonton, Dean Keith. 2016. “Reverse engineering genius: historiometric studies of superlative talent.” Annals of the New York Academy of Sciences 1377 (1): 3-9. https://doi.org/10.1111/nyas.13054.

The Language Archive. 2023. ELAN [Computer program] V.6.7. Max Planck Institute for Psycholinguistics. https://archive.mpi.nl/tla/elan.

Tognetti, Arnaud, Valerie Durand, Melissa Barkat-Defradas, and Astrid Hopfensitz. 2020. “Does He Sound Cooperative? Acoustic Correlates of Cooperativeness.” British Journal of Psychology 111(4): 823–39. https://doi.org/10.1111/bjop.12437.

UNESCO. 2014. “Inventory-making: a cumulative in-depth study of periodic reports.” UNESCO Intangible Cultural Heritage. Accessed April 8, 2024. https://ich.unesco.org/en/focus-on-inventory-making-2014-00876.

van Wassenhove, Virginie, Ken W. Grant, and David Poeppel. 2005. “Visual speech speeds up the neural processing of auditory speech.” In Proceedings of the National Academy of Sciences 102 (4): 1181-86. https://doi.org/10.1073/pnas.0408949102.

Weninger, Felix, Florian Eyben, Björn Schuller, Marcello Mortillaro, and Klaus Scherer. 2013. “On the Acoustics of Emotion in Audio: What Speech, Music, and Sound Have in Common.” Frontiers in Psychology 4: 292. https://doi.org/10.3389/fpsyg.2013.00292.

Weninger, Felix, Jarek Krajewski, Anton Batliner, and Björn Schuller. 2012. “The Voice of Leadership: Models and Performances of Automatic Analysis in Online Speeches.” IEEE Transactions on Affective Computing no. 3 (October): 496-508. https://doi.org/10.1109/T-AFFC.2012.15.

World Wide Web Consortium (W3C). 2019. “WebVTT: The Web Video Text Tracks Format.” https://www.w3.org/TR/webvtt1/.

Yeramyan, Pontish, and Eric Jackson. 2014. “Tapping into Genius.” https://www.gapinternational.com/insights/details/index/tapping-into-genius.

Yeramyan, Pontish, and Eric Jackson. 2015. “How Purposeful Mindsets Drive Performance.” https://www.gapinternational.com/insights/details/index/how-purposeful-mindsets-drive-performance.

Yuan, Jiahong, Wei Lai, Christopher Cieri, and Mark Liberman. 2023. “Using Forced Alignment for Phonetics Research.” In Chinese Language Resources, edited by Chu-Ren Huang, Shu-Kai Hsieh, and Peng Jin, 289–301. Text, Speech and Language Technology 49. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-38913-9_17.

Discovering and documenting brilliance

A novel multimodal annotation method

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

Indicizzazione

News

Language

Information