Zobrazeno 1 - 10
of 681
pro vyhledávání: '"Dixon, Paul"'
Publikováno v:
ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Using a vision-inspired keyword spotting framework, we propose an architecture with input-dependent dynamic depth capable of processing streaming audio. Specifically, we extend a conformer encoder with trainable binary gates that allow us to dynamica
Externí odkaz:
http://arxiv.org/abs/2309.00140
Spotting user-defined/flexible keywords represented in text frequently uses an expensive text encoder for joint analysis with an audio encoder in an embedding space, which can suffer from heterogeneous modality representation (i.e., large mismatch) a
Externí odkaz:
http://arxiv.org/abs/2308.06472
Autor:
Dixon, Paul A.
Publikováno v:
Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses
Thesis (Ph. D.)--University of California, San Diego, 1999.
Vita. Includes bibliographical references (leaves 232-243).
Vita. Includes bibliographical references (leaves 232-243).
Externí odkaz:
http://wwwlib.umi.com/cr/ucsd/fullcit?p3035408
Autor:
Abdelaziz, Ahmed Hussen, Theobald, Barry-John, Dixon, Paul, Knothe, Reinhard, Apostoloff, Nicholas, Kajareker, Sachin
We describe our novel deep learning approach for driving animated faces using both acoustic and visual information. In particular, speech-related facial movements are generated using audiovisual information, and non-speech facial movements are genera
Externí odkaz:
http://arxiv.org/abs/2005.13616
Autor:
Abdelaziz, Ahmed Hussen, Theobald, Barry-John, Binder, Justin, Fanelli, Gabriele, Dixon, Paul, Apostoloff, Nicholas, Weise, Thibaut, Kajareker, Sachin
Speech-driven visual speech synthesis involves mapping features extracted from acoustic speech to the corresponding lip animation controls for a face model. This mapping can take many forms, but a powerful approach is to use deep neural networks (DNN
Externí odkaz:
http://arxiv.org/abs/1905.06860