Denoised Bottleneck Features From Deep Autoencoders for Telephone Conversation Analysis
Autor: | Richard Dufour, Mohamed Morchid, Killian Janod, Georges Linarès, Renato De Mori |
---|---|
Přispěvatelé: | Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI |
Rok vydání: | 2017 |
Předmět: |
Acoustics and Ultrasonics
Computer science media_common.quotation_subject Speech recognition Feature extraction 02 engineering and technology computer.software_genre Bottleneck [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] 030507 speech-language pathology & audiology 03 medical and health sciences Transcription (linguistics) Robustness (computer science) 0202 electrical engineering electronic engineering information engineering Computer Science (miscellaneous) Conversation Electrical and Electronic Engineering ComputingMilieux_MISCELLANEOUS media_common business.industry Speech processing Autoencoder [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing Computational Mathematics Conversation analysis 020201 artificial intelligence & image processing Artificial intelligence 0305 other medical science business computer Natural language processing |
Zdroj: | IEEE/ACM Transactions on Audio, Speech and Language Processing IEEE/ACM Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2017, 25 (9), pp.1809-1820. ⟨10.1109/TASLP.2017.2718843⟩ |
ISSN: | 2329-9304 2329-9290 |
DOI: | 10.1109/taslp.2017.2718843 |
Popis: | Automatic transcription of spoken documents is affected by automatic transcription errors that are especially frequent when speech is acquired in severe noisy conditions. Automatic speech recognition errors induce errors in the linguistic features used for a variety of natural language processing tasks. Recently, denoisng autoencoders (DAE) and stacked autoencoders (SAE) have been proposed with interesting results for acoustic feature denoising tasks. This paper deals with the recovery of corrupted linguistic features in spoken documents. Solutions based on DAEs and SAEs are considered and evaluated in a spoken conversation analysis task. In order to improve conversation theme classification accuracy, the possibility of combining abstractions obtained from manual and automatic transcription features is considered. As a result, two original representations of highly imperfect spoken documents are introduced. They are based on bottleneck features of a supervised autoencoder that takes advantage of both noisy and clean transcriptions to improve the robustness of error prone representations. Experimental results on a spoken conversation theme identification task show substantial accuracy improvements obtained with the proposed recovery of corrupted features. |
Databáze: | OpenAIRE |
Externí odkaz: |