Zobrazeno 1 - 10
of 17
pro vyhledávání: '"Alexander H. Liu"'
Autor:
Joy S. Zeng, Vineet Padia, Grace Y. Chen, Joseph H. Maalouf, Aditya M. Limaye, Alexander H. Liu, Michael A. Yusov, Ian W. Hunter, Karthish Manthiram
Publikováno v:
ACS Central Science, Vol 10, Iss 7, Pp 1348-1356 (2024)
Externí odkaz:
https://doaj.org/article/f627ba0028a34dc28d8d1c1f30381ff2
Publikováno v:
IEEE Transactions on Circuits and Systems II: Express Briefs. 69:4178-4182
Publikováno v:
IEEE Signal Processing Letters. 29:2437-2441
Conventional audio-visual models have independent audio and video branches. In this work, we unify the audio and visual branches by designing a Unified Audio-Visual Model (UAVM). The UAVM achieves a new state-of-the-art audio-visual event classificat
We introduce the first unsupervised speech synthesis system based on a simple, yet effective recipe. The framework leverages recent work in unsupervised speech recognition as well as existing neural-based speech synthesis. Using only unlabeled speech
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3a387b749a1137ac0e30e3b4ee0ff97a
http://arxiv.org/abs/2204.02524
http://arxiv.org/abs/2204.02524
Unsupervised speech recognition has shown great potential to make Automatic Speech Recognition (ASR) systems accessible to every language. However, existing methods still heavily rely on hand-crafted pre-processing. Similar to the trend of making sup
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::4e27919c1d41c028cae4979000f5dfa6
Autor:
SouYoung Jin, James Glass, Alexander H. Liu, Mathew Monfort, Aude Oliva, David Harwath, Rogerio Feris
Publikováno v:
CVPR
When people observe events, they are able to abstract key information and build concise summaries of what is happening. These summaries include contextual and semantic information describing the important high-level details (what, where, who and how)
Publikováno v:
INTERSPEECH
Recently, end-to-end multi-speaker text-to-speech (TTS) systems gain success in the situation where a lot of high-quality speech plus their corresponding transcriptions are available. However, laborious paired data collection processes prevent many i
Self-supervised speech representations have been shown to be effective in a variety of speech applications. However, existing representation learning methods generally rely on the autoregressive model and/or observed global dependencies while generat
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::8cbe0a08ddcdc5f15b8eab1a8b557fe8
Publikováno v:
SLT
Whispering is an important mode of human speech, but no end-to-end recognition results for it were reported yet, probably due to the scarcity of available whispered speech data. In this paper, we present several approaches for end-to-end (E2E) recogn
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::58d371776f5fe35bb718432073a2b793
Publikováno v:
ACL
Speech translation (ST) aims to learn transformations from speech in the source language to the text in the target language. Previous works show that multitask learning improves the ST performance, in which the recognition decoder generates the text