Zobrazeno 1 - 10
of 33
pro vyhledávání: '"David Harwath"'
Publikováno v:
2023 11th International IEEE/EMBS Conference on Neural Engineering (NER).
Data-driven speech processing models usually perform well with a large amount of text supervision, but collecting transcribed speech data is costly. Therefore, we propose SpeechCLIP, a novel framework bridging speech and text through images to enhanc
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::85e3f6e7297423e0345649c252d5bac1
http://arxiv.org/abs/2210.00705
http://arxiv.org/abs/2210.00705
Autor:
Tyler Miller, David Harwath
Publikováno v:
Interspeech 2022.
Autor:
David Xu, David Harwath
Publikováno v:
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Autor:
Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogerio Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James Glass
Multilingual text-video retrieval methods have improved significantly in recent years, but the performance for other languages lags behind English. We propose a Cross-Lingual Cross-Modal Knowledge Distillation method to improve multilingual text-vide
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::14e9221132f9b7c09a4f9a64bf4baa19
This work investigates the use of large-scale, English-only pre-trained models (CLIP and HuBERT) for multilingual image-speech retrieval. For non-English image-speech retrieval, we outperform the current state-of-the-art performance by a wide margin
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::633116477493a86a9340efedaeaa26e3
Autor:
Reem Gody, David Harwath
Self-supervised learning (SSL) has been able to leverage unlabeled data to boost the performance of automatic speech recognition (ASR) models when we have access to only a small amount of transcribed speech data. However, this raises the question of
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::f2a47a66228edbe8884ddd7b7c1d24d5
Autor:
SouYoung Jin, James Glass, Alexander H. Liu, Mathew Monfort, Aude Oliva, David Harwath, Rogerio Feris
Publikováno v:
CVPR
When people observe events, they are able to abstract key information and build concise summaries of what is happening. These summaries include contextual and semantic information describing the important high-level details (what, where, who and how)
Autor:
Rameswar Panda, Andrew Rouditchenko, Hilde Kuehne, Angie Boggust, James Glass, Rogerio Feris, David Harwath, Brian Chen, Brian Kingsbury, Samuel Thomas, Michael Picheny
In this paper, we explore self-supervised audio-visual models that learn from instructional videos. Prior work has shown that these models can relate spoken words and sounds to visual content after training on a large-scale dataset of videos, but the
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::4e8574a412328c0178369baec3688670
Autor:
Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Hilde Kuehne
Multi-modal learning from video data has seen increased attention recently as it allows to train semantically meaningful embeddings without human annotation enabling tasks like zero-shot retrieval and classification. In this work, we present a multi-
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::fba61791fda0c6282401fe03b281bcb5