Acoustic event diarization in TV/movie audios using deep embedding and integer linear programming

Autor: Mingle Liu, Jichen Yang, Yanxiong Li, Yuhan Zhang, Xianku Li, Wang Wucheng
Rok vydání: 2019
Předmět:
Zdroj: Multimedia Tools and Applications. 78:33999-34025
ISSN: 1573-7721
1380-7501
DOI: 10.1007/s11042-019-07991-6
Popis: In this study, we propose a method for acoustic event diarization based on a feature of deep embedding and a clustering algorithm of integer linear programming. The deep embedding learned by deep auto-encoder network is used to represent the properties of different classes of acoustic events, and then the integer linear programming is adopted for merging audio segments belonging to the same class of acoustic events. Four kinds of TV/movie audios (21.5 h in total) are used as experimental data, including Sport, Situation comedy, Award ceremony, and Action movie. We compare the deep embedding with state-of-the-art features. Further, the clustering algorithm of integer linear programming is compared with other clustering algorithms adopted in previous works. Finally, the proposed method is compared to both supervised and unsupervised methods on four kinds of TV/movie audios. The results show that the proposed method is superior to other unsupervised methods based on agglomerative information bottleneck, Bayesian information criterion and spectral clustering, and is little inferior to the supervised method based on deep neural network in terms of acoustic event error.
Databáze: OpenAIRE