Person instance graphs for mono-, cross- and multi-modal person recognition in multimedia data: application to speaker identification in TV broadcast
Autor: | Viet Bac Le, Anindya Roy, Hervé Bredin, Claude Barras |
---|---|
Přispěvatelé: | Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919), Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11), Vocapia Research [Orsay], Vocapia |
Jazyk: | angličtina |
Rok vydání: | 2014 |
Předmět: |
Optimization problem
Computer science Theoretical definition 02 engineering and technology Library and Information Sciences computer.software_genre graph mining integer linear programming 0202 electrical engineering electronic engineering information engineering Media Technology Person recognition [INFO]Computer Science [cs] Cluster analysis Integer programming multimedia cross-modal processing speaker identification Multimedia [INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM] 020207 software engineering Speaker recognition Speaker diarisation Modal person recognition 020201 artificial intelligence & image processing computer Information Systems |
Zdroj: | International Journal of Multimedia Information Retrieval International Journal of Multimedia Information Retrieval, Springer, 2014, 3 (3), pp.161-175. ⟨10.1007/s13735-014-0055-y⟩ |
ISSN: | 2192-6611 2192-662X |
DOI: | 10.1007/s13735-014-0055-y⟩ |
Popis: | The final publication is available at https://link.springer.com/article/10.1007/s13735-014-0055-y; International audience; This work introduces a unified framework for mono-, cross-and multi-modal person recognition in multimedia data. Dubbed Person Instance Graph, it models the person recognition task as a graph mining problem: i.e. finding the best mapping between person instance vertices and identity vertices. Practically, we describe how the approach can be applied to speaker identification in TV broadcast. Then, a solution to the above-mentioned mapping problem is proposed. It relies on Integer Linear Programming to model the problem of clustering person instances based on their identity. We provide an in-depth theoretical definition of the optimization problem. Moreover, we improve two fundamental aspects of our previous related work: the problem constraints and the optimized objective function. Finally, a thorough experimental evaluation of the proposed framework is performed on a publicly available benchmark database. Depending on the graph configuration (i.e. the choice of its vertices and edges), we show that multiple tasks can be addressed interchangeably (e.g. speaker diarization, supervised or unsuper-vised speaker identification), significantly outperform-ing state-of-the-art mono-modal approaches. |
Databáze: | OpenAIRE |
Externí odkaz: |