Leveraging structural information in music-speech dectection

Autor: Jinyu Han, Bob Coover
Rok vydání: 2013
Předmět:
Zdroj: ICME Workshops
DOI: 10.1109/icmew.2013.6618387
Popis: Detecting music or speech signals in an audio mixture is an important but challenging problem. Even more challenging is detecting when both are present in a signal at the same time. This problem requires not only discriminating speech or music from each other but also detecting its presence in a mixture with interfering signals. In this paper, we address the problem of detecting speech and music signals in the presence of each other. We focus on leveraging features that capture the structural properties of audio to improve the performance of concurrent music-speech detection. Continuous Frequency Activation (CFA) is used to account for the sustained pitch/harmonic activities, and a new feature called Transient Activation (TAC) is proposed for the transient/percussive activities in an audio signal. The effectiveness of these features along with other acoustic features is evaluated in different statistical classification schemes. Feature selection is conducted to select the best feature set to maximize the detection performance. Experimental results on real world broadcast recordings have shown significant improvement by using the above techniques to incorporate the structural information of audio.
Databáze: OpenAIRE