Sentence boundary detection without speech recognition: A case of an underresourced language

Autor: Nursuriati Jamil, Muhammad Izzad Ramli, Noraini Seman
Jazyk: angličtina
Rok vydání: 2015
Předmět:
Zdroj: Journal of Electrical Systems, Vol 11, Iss 3, Pp 308-318 (2015)
ISSN: 1112-5209
Popis: Sentence boundary detection (SBD), also known as sentence segmentation decides where a sentence begins and ends. Previous method of SBD is either done by linguistic approach or acoustic approach; or combination of both approaches. Even though linguistic approach generally performed better than acoustic approach, it requires the need of a speech recognition component. This is a constraint for Under Resource Languages such as the Malay language. This paper describes the SBD for spontaneous Malay language spoken audio. Experiments are conducted on a forty-two minutes question-answer (Q/A) Malaysia parliamentary session comprising 12 adult male speakers and 4 female speakers. The speech datasets are first classified as speech/non-speech segments and only the non-speech segments are further tested as candidates of sentence boundaries. Seven prosodic features, rate-of-speech and volume are then extracted from the boundary candidates for classification. Our proposed SBD method using supervised Adaboost classifier managed a promising100% accuracy rate with 19.44% error rate. For future work, we intend to reduce the error rate by implementing end-point detection on the boundary candidates.
Databáze: OpenAIRE