AVAS: Speech database for multimodal recognition applications

Autor:	Mohamed F. Tolba, Saleh Aly, Alaa Sagheer, Samar Antar
Rok vydání:	2013
Předmět:	Audio mining Voice activity detection Computer science business.industry Speech recognition ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION Acoustic model Audio-visual speech recognition Speech processing Facial recognition system Domain (software engineering) Computer vision Speech analytics Artificial intelligence business
Zdroj:	HIS
DOI:	10.1109/his.2013.6920467
Popis:	Audio-visual speech recognition (AVSR) systems represent an important branch in the human computer interaction (HCI) domain, since it is the simplest way to interact with computer. However, difficulties due to visual variations in video sequence can significantly degrade the recognition performance of AVSR systems. Although several corpuses have been created in this area, most of them are not include realistic visual variations in video sequence. This paper presents the first Audio-Visual Speech recognition corpus using Arabic language denoted as AVAS. All AVAS samples contain two of the most important visual variations; illumination variations and head pose variations, in the same video recording. Hence, AVAS is useful in the development of robust AVSR systems, automatic speech recognition “audio-only” systems, lip-reading “visual-only” systems and face recognition across pose and illumination variations.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::7b05d4fdc293332847f19467b9f89151 https://doi.org/10.1109/his.2013.6920467 Zobrazit plný text záznamu