Person instance graphs for mono-, cross- and multi-modal person recognition in multimedia data: application to speaker identification in TV broadcast

Autor:	Viet Bac Le, Anindya Roy, Hervé Bredin, Claude Barras
Přispěvatelé:	Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919), Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11), Vocapia Research [Orsay], Vocapia
Jazyk:	angličtina
Rok vydání:	2014
Předmět:	Optimization problem Computer science Theoretical definition 02 engineering and technology Library and Information Sciences computer.software_genre graph mining integer linear programming 0202 electrical engineering electronic engineering information engineering Media Technology Person recognition [INFO]Computer Science [cs] Cluster analysis Integer programming multimedia cross-modal processing speaker identification Multimedia [INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM] 020207 software engineering Speaker recognition Speaker diarisation Modal person recognition 020201 artificial intelligence & image processing computer Information Systems
Zdroj:	International Journal of Multimedia Information Retrieval International Journal of Multimedia Information Retrieval, Springer, 2014, 3 (3), pp.161-175. ⟨10.1007/s13735-014-0055-y⟩
ISSN:	2192-6611 2192-662X
DOI:	10.1007/s13735-014-0055-y⟩
Popis:	The final publication is available at https://link.springer.com/article/10.1007/s13735-014-0055-y; International audience; This work introduces a unified framework for mono-, cross-and multi-modal person recognition in multimedia data. Dubbed Person Instance Graph, it models the person recognition task as a graph mining problem: i.e. finding the best mapping between person instance vertices and identity vertices. Practically, we describe how the approach can be applied to speaker identification in TV broadcast. Then, a solution to the above-mentioned mapping problem is proposed. It relies on Integer Linear Programming to model the problem of clustering person instances based on their identity. We provide an in-depth theoretical definition of the optimization problem. Moreover, we improve two fundamental aspects of our previous related work: the problem constraints and the optimized objective function. Finally, a thorough experimental evaluation of the proposed framework is performed on a publicly available benchmark database. Depending on the graph configuration (i.e. the choice of its vertices and edges), we show that multiple tasks can be addressed interchangeably (e.g. speaker diarization, supervised or unsuper-vised speaker identification), significantly outperform-ing state-of-the-art mono-modal approaches.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::87aeecccf19eec8f4ed6f27093de96ad https://hal.archives-ouvertes.fr/hal-01690350/document Zobrazit plný text záznamu Full text from SpringerLink