The textometric concept of active corpus

Autor: Pincemin, Bénédicte, Heiden, Serge, Mazuet, Franck
Přispěvatelé: Institut d’Histoire des Représentations et des Idées dans les Modernités (IHRIM), École normale supérieure de Lyon (ENS de Lyon)-Université Lumière - Lyon 2 (UL2)-Université Jean Moulin - Lyon 3 (UJML), Université de Lyon-Université de Lyon-Université Blaise Pascal - Clermont-Ferrand 2 (UBP)-Université Jean Monnet - Saint-Étienne (UJM)-Centre National de la Recherche Scientifique (CNRS)-Université Clermont Auvergne (UCA), Centre d'histoire sociale des mondes contemporains (CHS), Université Paris 1 Panthéon-Sorbonne (UP1)-Centre National de la Recherche Scientifique (CNRS), VADISTAT - Per Simona Balbi, Univ. of Naples Federico II, Misuraca, Michelangelo, Scepi, Germana, Spano, Maria, ANR-17-CE38-0010,ANTRACT,Analyse Transdisciplinaire des Actualités filmées (1945-1969)(2017)
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: 16th International Conference on Statistical Analysis of Textual Data JADT 2022
16th International Conference on Statistical Analysis of Textual Data JADT 2022, VADISTAT-Per Simona Balbi, Univ. of Naples Federico II, Jul 2022, Naples, Italy. pp.691-698
Popis: International audience; Active corpus provides the possibility to apply searching and statistical computing as if corpus were reduced to selected words, whereas full text still remains visible in context display. This is mainly implemented in paradigmatic processing, yet it may concern syntagmatic processing or text display too. Here we experiment active corpus in syntagmatic processing. A projection generates a new corpus, in which words are semantic tags that were automatically assigned in a first step to the original data. This new corpus makes it easy to explore tag sequences, with any generic textometric tool available, however sparse the original annotation may be. This methodological path was applied to film grammar analysis on 10,000 archival descriptions of news reports. 19 camera shot and angle types were ed through queries and tagged. This annotation became the lexicon of the projected corpus that was used to study shot sequences. The annotation and projection tools we have run are available as utilities in TXM open-sourcesoftware and should usefully serve many research projects.
Databáze: OpenAIRE