Spatial Bias in Vision-Based Voice Activity Detection
Autor: | Giampiero Salvi, Kalin Stefanov, Mohammad Adiban |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
Masking (art)
Voice activity detection Artificial neural network Vision based business.industry Computer science Generalization Speech recognition 02 engineering and technology 030507 speech-language pathology & audiology 03 medical and health sciences Data acquisition 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence 0305 other medical science business Sensory cue Spatial bias |
Zdroj: | IEEE Computer Society ICPR |
Popis: | We develop and evaluate models for automatic vision-based voice activity detection (VAD) in multiparty human-human interactions that are aimed at complementing acoustic VAD methods. We provide evidence that this type of vision-based VAD models are susceptible to spatial bias in the dataset used for their development; the physical settings of the interaction, usually constant throughout data acquisition, determines the distribution of head poses of the participants. Our results show that when the head pose distributions are significantly different in the train and test sets, the performance of the vision-based VAD models drops significantly. This suggests that previously reported results on datasets with a fixed physical configuration may overestimate the generalization capabilities of this type of models. We also propose a number of possible remedies to the spatial bias, including data augmentation, input masking and dynamic features, and provide an in-depth analysis of the visual cues used by the developed vision-based VAD models. |
Databáze: | OpenAIRE |
Externí odkaz: |