Using the First Axis of a Correspondence Analysis as an Analytical Tool

Autor: Pincemin, Bénédicte, Guillot-Barbance, Céline, Lavrentiev, Alexei
Přispěvatelé: Institut d’Histoire des Représentations et des Idées dans les Modernités (IHRIM), École normale supérieure - Lyon (ENS Lyon)-Université Lumière - Lyon 2 (UL2)-Université Jean Moulin - Lyon 3 (UJML), Université de Lyon-Université de Lyon-Université Blaise Pascal - Clermont-Ferrand 2 (UBP)-Université Jean Monnet [Saint-Étienne] (UJM)-Université Clermont Auvergne [2017-2020] (UCA [2017-2020])-Centre National de la Recherche Scientifique (CNRS), PaLaFra ANR-DFG project (ANR-14-FRAL-0006), DII– Department of Enterprise Engineering 'Mario Lucertini' Tor Vergata University, DSS– Department of Statistical Sciences, Sapienza University, Rome, Domenica Fioredistella IEZZI, Livia CELARDO, Michelangelo MISURACA, ANR-14-FRAL-0006,PaLaFra,Le PAssage du LAtin au FRAnçais: constitution et analyse d'un corpus numérique latino-français(2014)
Jazyk: angličtina
Rok vydání: 2018
Předmět:
Zdroj: Proceedings of 14th International Conference on the Statistical Analysis of Textual Data
14th International Conference on the Statistical Analysis of Textual Data / 14es Journées internationales d'Analyse statistique des Données Textuelles (JADT 2018)
14th International Conference on the Statistical Analysis of Textual Data / 14es Journées internationales d'Analyse statistique des Données Textuelles (JADT 2018), DII– Department of Enterprise Engineering “Mario Lucertini” Tor Vergata University; DSS– Department of Statistical Sciences, Sapienza University, Rome, Jun 2018, Roma, Italy. pp.594-601
Popis: International audience; Our corpus of medieval French texts is divided into 59 discourse units (DUs) which cross text genres and spoken vs non spoken text chunks (as tagged with q and sp TEI tags). A correspondence analysis (CA) performed on selected POS tags indicates orality as the main dimension of variation across DUs. We then design several methodological paths to investigate this gradient as computed by the CA first axis. Bootstrap is used to check the stability of observations; gradient-ordered barplots provide both a synthetic and analytic view of the correlation of any variable with the gradient; a way is also found to characterize the gradient poles (here, more-oral or less-oral poles) not only with the POS used for the CA analysis, but also with words, in order to get a more precise and lexical description. This methodology could be transposed to other data with a potential gradient structure.
Databáze: OpenAIRE