Quantifying a register-quantification measure (SOLT) via human formality ratings: A comparison of an expert-based, a corpus-based, and an experiment-based measure of formality in German

Autor: Pescuma, Valentina N., Knoeferle, Pia, Rezaki, Ada Dilek, Sauerland, Uli, Tsiapou, Dimitra
Rok vydání: 2023
Předmět:
DOI: 10.17605/osf.io/yzkxv
Popis: For quantitative investigations of variation in formality and register phenomena more broadly, quantitative measures of the formality need to be established. Our research contributes to the development of formality measures for German, but comparing three different approaches to quantify register: 1) An expert-based measure, namely the decision of authors of the German Duden dictionary. 2) a corpus-based measure, the SOLT (Schriftsprache/Oralsprache logarithmisch transformiert, “written language/oral language logarithmically transformed”), is a tool developed by Sauerland (2022) to quantify a word’s register information. The measure is based on the correlation between register and literality. While there are other factors contributing to register differences, literal language tends to be characterized by a more formal (elevated) register, while oral language is comparatively more characterized by colloquial register (see Sauerland, 2022). SOLT is a measure based on this correlation for quantifying register. It corresponds to the logarithm of the ratio between the written and spoken frequencies of a lemma. Higher formality of a word is thus expected to correspond to higher SOLT scores compared to more colloquial words. SOLT has previously been validated via a corpus-based study (Sauerland, 2022). And 3) an experiment collecting the naive intuitions of German native speakers. On the one hand, the SOLT value provides an intuitive understanding of the ratio between written and oral frequency (“A SOLT value of 1 means a lemma occurs twice as often in the written as in the oral corpus [...]”, Sauerland, 2022, p. 262). On the other hand, its further validation through human ratings will offer further insight into speakers’ language use and linguistic behavior (for similar arguments, see: Barking et al., 2022; Pescuma, Serova, et al., 2023). Re: 1) In this study 60 words were randomly selected from the German DUDEN dictionary; they were classified according to three levels of formality markings (elevated, unmarked, colloquial), and their SOLT values were calculated based on corpus information (Re: 2)). Statistical models were able to best predict a word’s SOLT score when formality marking was included as a fixed effect. Additionally, the SOLT scores for the three formality marking levels were observed in the expected direction. While this study suggests that SOLT can be employed as a measure of language formality, it has not yet been validated through behavioral data (Re: 3)). The aim of this study is thus to establish whether SOLT can reliably predict a word’s register information (based on formality ratings). To this end, we collected participants’ formality ratings for 60 German words with elevated, unmarked, or colloquial formality marking (stimuli from Sauerland, 2022) and will investigate their correlations with SOLT scores.
Databáze: OpenAIRE