Combining acoustic name spotting and continuous context models to improve spoken person name recognition in speech
Autor: | Richard Dufour, Corinne Fredouille, Georges Linarès, Benjamin Bigot, Gregory Senay |
---|---|
Přispěvatelé: | Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Fredouille, Corinne |
Rok vydání: | 2013 |
Předmět: |
spoken name spotting
business.industry Computer science Speech recognition Search engine indexing 020206 networking & telecommunications Context (language use) 02 engineering and technology Spotting Spoken name recognition computer.software_genre [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] 030507 speech-language pathology & audiology 03 medical and health sciences linguistic context modelling [INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] 0202 electrical engineering electronic engineering information engineering Artificial intelligence Transcription (software) 0305 other medical science business Set (psychology) phoneme confusion network computer Natural language processing |
Zdroj: | INTERSPEECH Interspeech 2013 Interspeech 2013, Aug 2013, Lyon, France |
DOI: | 10.21437/interspeech.2013-572 |
Popis: | International audience; Retrieving pronounced person names in spoken documents is a critical problematic in the context of audiovisual content indexing. In this paper, we present a cascading strategy for two methods dedicated to spoken name recognition in speech. The first method is an acoustic name spotting in phoneme confusion networks. It is based on a phonetic edition distance criterion based on phoneme probabilities held in confusion networks. The second method is a continuous context modelling approach applied on the 1-best transcription output. It relies on a probabilistic modelling of name-to-context dependencies. We assume that the combination of these methods, based on different types of information, may improve spoken name recognition performance. This assumption is studied through experiments done on a set of audiovisual documents from the development set of the REPERE challenge. Results report that combining acoustic and linguistic methods produces an absolute gain of 3% in terms of F-measure compared to the best system taken alone. |
Databáze: | OpenAIRE |
Externí odkaz: |