Revised annotation conventions in Hungarian speech corpora.

Autor: Mády, Katalin, Kohári, Anna, Gráczi, Tekla Etelka, Mihajlik, Péter
Předmět:
Zdroj: Beszedtudomany - Speech Science; 2024, Vol. 4 Issue 1, p185-202, 18p
Abstrakt: This technical report presents the revised annotation conventions for one large and two smaller Hungarian speech corpora, the BEA Spoken Language Database, the Akaka Maptask Corpus, and the Budapest Games Corpus. Annotations relying on standard Hungarian orthography rather than actual and partly reduced phonetic realisations make it possible to run both linguistic and phonetic queries on a large amount of data. Since the vast majority of the recordings contain (semi-)spontaneous speech, non-lexical phenomena such as hesitations (filled pauses) and non-verbal events such as laughter are labelled. The frequency of the occurrences of these phenomena is demonstrated on the subset Release 1 of the BEA database on speech samples of 115 speakers. Unsurprisingly, laughter and communicative grunts were more frequent in spontaneous speech when expressed in relative numbers. Hesitations occurred more often in semi-spontaneous speech than in read and spontaneous speech showing that the task demanded a higher cognitive effort from speakers. The majority of questions were found in spontaneous speech since the reading tasks did not include interrogatives. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index