PoliModalCorpus
Verso la costruzione del primo corpus multimodale di dominio politico in italiano
Parole chiave:
Political communication, Corpus Linguistics, Multimodal corpora, XML-TEI annotation, Natural Language ProcessingAbstract
This work introduces the PoliModalCorpus, the first multimodal political domain corpus in Italian. The corpus was constructed to fill the lack of Italian linguistic resources for political–institutional communication. The data includes the transcripts of 59 face–to–face interviews in the political talk show “In mezz’ora in più”. This paper illustrates the methodology employed for data collection, the corpus construction, and the annotation scheme proposed to structure the data. A new level of analysis is proposed, which consists in a linguistic and terminological analysis, not only on a quantitative level through textual statistics based on a morpho–syntactic analysis, but by inserting a level of annotation focusing on non–verbal aspects — pauses, non–lexical backchannels — that occur during the political interviews. Non–verbal expressions do not simply accompany attempts to interrupt the speakers but are indicators of the success of their intentions, including persuasive strategies.