Hierarchical Residual-pyramidal Model for Large Context Based Media Presence Detection
Autor: | Qingming Tang, Chao Wang, Chieh-Chi Kao, Viktor Rozgic, Ming Sun |
---|---|
Rok vydání: | 2019 |
Předmět: |
Normalization (statistics)
geography geography.geographical_feature_category Computer science Speech recognition 020206 networking & telecommunications 02 engineering and technology 010501 environmental sciences Context based Residual 01 natural sciences Variation (linguistics) 0202 electrical engineering electronic engineering information engineering Voice Representation (mathematics) Sound (geography) 0105 earth and related environmental sciences |
Zdroj: | ICASSP |
DOI: | 10.1109/icassp.2019.8683430 |
Popis: | We study media presence detection, that is, learning to recognize if a sound segment (typically lasting for a few seconds) of a long recorded stream contains media (TV) sound. This problem is difficult because non-media sound sources can be quite diverse (e.g. human voicing, non-vocal sounds and non-human sounds), and the recorded sound can be a mixture of media and non-media sound.Different from speech recognition, where the recognizer needs to detect local phonetic variation, the key features used to distinguish media and non-media sounds are non-local features. Motivated by this, we propose a hierarchical model to learn representation of each pre-chunked segment within a long recorded stream jointly, and encourage every local representation to be not sensitive to variations within each segment. We also further explore the effects of techniques including stream based normalization and iteratively imputing missing labels of training dataset. Experimental results indicate that our proposed contextual based methods are effective for media presence detection. |
Databáze: | OpenAIRE |
Externí odkaz: |