Combining acoustic name spotting and continuous context models to improve spoken person name recognition in speech

Autor:	Richard Dufour, Corinne Fredouille, Georges Linarès, Benjamin Bigot, Gregory Senay
Přispěvatelé:	Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Fredouille, Corinne
Rok vydání:	2013
Předmět:	spoken name spotting business.industry Computer science Speech recognition Search engine indexing 020206 networking & telecommunications Context (language use) 02 engineering and technology Spotting Spoken name recognition computer.software_genre [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] 030507 speech-language pathology & audiology 03 medical and health sciences linguistic context modelling [INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] 0202 electrical engineering electronic engineering information engineering Artificial intelligence Transcription (software) 0305 other medical science business Set (psychology) phoneme confusion network computer Natural language processing
Zdroj:	INTERSPEECH Interspeech 2013 Interspeech 2013, Aug 2013, Lyon, France
DOI:	10.21437/interspeech.2013-572
Popis:	International audience; Retrieving pronounced person names in spoken documents is a critical problematic in the context of audiovisual content indexing. In this paper, we present a cascading strategy for two methods dedicated to spoken name recognition in speech. The first method is an acoustic name spotting in phoneme confusion networks. It is based on a phonetic edition distance criterion based on phoneme probabilities held in confusion networks. The second method is a continuous context modelling approach applied on the 1-best transcription output. It relies on a probabilistic modelling of name-to-context dependencies. We assume that the combination of these methods, based on different types of information, may improve spoken name recognition performance. This assumption is studied through experiments done on a set of audiovisual documents from the development set of the REPERE challenge. Results report that combining acoustic and linguistic methods produces an absolute gain of 3% in terms of F-measure compared to the best system taken alone.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e9ed4aba10ed4ae9e1f45ba5fb2f7ca3 https://doi.org/10.21437/interspeech.2013-572 Zobrazit plný text záznamu