Zobrazeno 1 - 10
of 23
pro vyhledávání: '"Gurban, M."'
Autor:
Gurban, M., Jean-Philippe Thiran
Publikováno v:
Scopus-Elsevier
A multimodal probabilistic framework is proposed for the problem of finding the active speaker in a video sequence. We localize the current speaker's mouth in the image by using the video and the audio channels together. We propose a novel visual fea
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::ac7decee33dab96c523ad418a538b271
Autor:
Gurban, M., Jean-Philippe Thiran
Publikováno v:
Scopus-Elsevier
Traditional speech recognition systems use Gaussian mixture models to obtain the likelihoods of individual phonemes, which are then used as state emission probabilities in hidden Markov models representing the words. In hybrid systems, the Gaussian m
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::fa920825f703f2ece007bbcb61e99f0a
Publikováno v:
2009 16th IEEE International Conference on Image Processing (ICIP); 2009, p1433-1436, 4p
Publikováno v:
2007 IEEE 9th Workshop on Multimedia Signal Processing; 2007, p179-182, 4p
Autor:
Gurban, M., Jean-Philippe Thiran
Publikováno v:
Scopus-Elsevier
We present a method for dynamically integrating audio-visual information for speech recognition, based on the estimated reliability of the audio and visual streams. Our method uses an information theoretic measure, the entropy derived from the state
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::919bc362a2b589bae7f6a7d87cc0f65d
http://www.scopus.com/inward/record.url?eid=2-s2.0-77956479500&partnerID=MN8TOARS
http://www.scopus.com/inward/record.url?eid=2-s2.0-77956479500&partnerID=MN8TOARS
Autor:
Gurban, M., Thiran, J.
Multimodal signals can be defined in general as signals originating from the same physical source, but acquired through different devices, techniques or protocols. This applies for example to audio-visual signals, medical or satellite images. Underst
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=od_______185::be1974146aa8eb9853d5a3626e9d64e4
https://infoscience.epfl.ch/record/87194
https://infoscience.epfl.ch/record/87194
Publikováno v:
Scopus-Elsevier
The use of omnidirectional cameras for videoconferencing promises to simplify the hardware setup necessary for large groups of participants. We investigate the use of a multimodal speaker detection algorithm on audio-visual sequences captured with su
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::bcf0f307a2956a88a2c33f261869b7ac
https://infoscience.epfl.ch/record/140633
https://infoscience.epfl.ch/record/140633
Publikováno v:
Scopus-Elsevier
Audio-visual speech recognition promises to improve the performance of speech recognizers, especially when the audio is corrupted, by adding information from the visual modality, more specifically, from the video of the speaker. However, the number o
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::906d1d98301d95db977193a46320f03c
https://infoscience.epfl.ch/record/109488
https://infoscience.epfl.ch/record/109488
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.