Bengali Basic Travel Expression Corpus: A statistical analysis

Autor: Madhab Pal, Rajib Roy, Soma Khan, T. K. Basu, Milton Samirakshma Bepari, Joyanta Basu
Rok vydání: 2014
Předmět:
Zdroj: O-COCOSDA
DOI: 10.1109/icsda.2014.7051420
Popis: The Japanese-English aligned Basic Travel Expression Corpus (BTEC) has been used as a basic dataset for development of real-world Speech-to-Speech Translation (S2ST) systems in related prior studies. This paper presents a detailed statistical analysis on the Bengali translated BTEC text and its phonetic transcriptions for development of English-Bengali speech translation applications in travel domain. In different level of analysis hierarchy, the study focuses on the lexical and phonetical status of the analyzed corpus based on frequency spectrums, estimated population size, coverage ratio, goodness of fit of Large Number of Rare Events (LNRE) model and transition patterns. The experimental observations provide necessary insights on sufficiency of the analyzed corpus with respect to the travel domain as well as for building basic components of English-Bengali S2ST system.
Databáze: OpenAIRE