The Benefit Of Temporally-Strong Labels In Audio Event Classification

Autor:	Hershey, Shawn, Ellis, Daniel P W, Fonseca, Eduardo, Jansen, Aren, Liu, Caroline, Moore, R Channing, Plakal, Manoj
Rok vydání:	2021
Předmět:	Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing
Druh dokumentu:	Working Paper
Popis:	To reveal the importance of temporal precision in ground truth audio event labels, we collected precise (~0.1 sec resolution) "strong" labels for a portion of the AudioSet dataset. We devised a temporally strong evaluation set (including explicit negatives of varying difficulty) and a small strong-labeled training subset of 67k clips (compared to the original dataset's 1.8M clips labeled at 10 sec resolution). We show that fine-tuning with a mix of weak and strongly labeled data can substantially improve classifier performance, even when evaluated using only the original weak labels. For a ResNet50 architecture, d' on the strong evaluation data including explicit negatives improves from 1.13 to 1.41. The new labels are available as an update to AudioSet. Comment: Accepted for publication at ICASSP 2021
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2105.07031 Zobrazit plný text záznamu View this record from Arxiv