Speaker role clustering using turn features and maximum inter-cluster distances

Autor: Zhuoming Chen, Xue Zhang, Aiwu Chen, Qian Huang, Xianku Li, Xiaohui Feng, Jichen Yang, Yanxiong Li
Rok vydání: 2016
Předmět:
Zdroj: 2016 International Conference on Audio, Language and Image Processing (ICALIP).
DOI: 10.1109/icalip.2016.7846538
Popis: Speaker role clustering is to obtain the number of different roles and to merge the utterances of the same role into one cluster in an unsupervised way, which is important for rich transcription of multi-speaker spoken documents. This paper presents an approach to role clustering using turn features and maximum distances of inter-clusters. The turn features of each speaker are extracted from audio outputs of speaker diarization, and used as the initial clusters. During clustering iteration, the cluster-pair (e.g. C A and C B ) with the minimum distance is merged and the cluster number is decreased by one if the distance of the N c - 1 clusters (after merging C A and C B ) is bigger than that of the N c clusters (not merging C A and C B ); otherwise, the clustering iteration is finished. Evaluated on four types of multi-speaker spoken documents, the proposed approach outperforms the previous clustering approach and is close to the supervised approach in terms of K scores.
Databáze: OpenAIRE