Recurrent Neural Network Based Speaker Change Detection from Text Transcription Applied in Telephone Speaker Diarization System
Autor: | Luděk Müller, Vlasta Radová, Daniel Soutner, Zbyněk Zajíc, Marek Hrúz |
---|---|
Rok vydání: | 2018 |
Předmět: |
Computer science
media_common.quotation_subject Speech recognition 01 natural sciences Convolutional neural network Speaker diarisation 030507 speech-language pathology & audiology 03 medical and health sciences Recurrent neural network 0103 physical sciences Spectrogram Conversation Segmentation Transcription (software) 0305 other medical science 010301 acoustics Change detection media_common |
Zdroj: | Text, Speech, and Dialogue ISBN: 9783030007935 TSD |
DOI: | 10.1007/978-3-030-00794-2_37 |
Popis: | In this paper, we propose a speaker change detection system based on lexical information from the transcribed speech. For this purpose, we applied a recurrent neural network to decide if there is an end of an utterance at the end of a spoken word. Our motivation is to use the transcription of the conversation as an additional feature for a speaker diarization system to refine the segmentation step to achieve better accuracy of the whole diarization system. We compare the proposed speaker change detection system based on transcription (text) with our previous system based on information from spectrogram (audio) and combine these two modalities to improve the results of diarization. We cut the conversation into segments according to the detected changes and represent them by an i-vector. We conducted experiments on the English part of the CallHome corpus. The results indicate improvement in speaker change detection (by 0.5% relatively) and also in speaker diarization (by 1% relatively) when both modalities are used. |
Databáze: | OpenAIRE |
Externí odkaz: |