Zobrazeno 1 - 10
of 190
pro vyhledávání: '"WU, ANNE"'
Multi-turn interactions between large language models (LLMs) and users naturally include implicit feedback signals. If an LLM responds in an unexpected way to an instruction, the user is likely to signal it by rephrasing the request, expressing frust
Externí odkaz:
http://arxiv.org/abs/2410.13852
This study evaluates three state-of-the-art MLLMs -- GPT-4V, Gemini Pro, and the open-source model IDEFICS -- on the compositional natural language vision reasoning task NLVR. Given a human-written sentence paired with a synthetic image, this task re
Externí odkaz:
http://arxiv.org/abs/2402.17793
We present lilGym, a new benchmark for language-conditioned reinforcement learning in visual environments. lilGym is based on 2,661 highly-compositional human-written natural language statements grounded in an interactive visual environment. We intro
Externí odkaz:
http://arxiv.org/abs/2211.01994
Publikováno v:
Contemporary Accounting Research; Sep2024, Vol. 41 Issue 3, p1672-1694, 23p
In this paper, we improve speech translation (ST) through effectively leveraging large quantities of unlabeled speech and text data in different and complementary ways. We explore both pretraining and self-training by using the large Libri-Light spee
Externí odkaz:
http://arxiv.org/abs/2104.06678
Autor:
Wang, Changhan, Rivière, Morgane, Lee, Ann, Wu, Anne, Talnikar, Chaitanya, Haziza, Daniel, Williamson, Mary, Pino, Juan, Dupoux, Emmanuel
We introduce VoxPopuli, a large-scale multilingual corpus providing 100K hours of unlabelled speech data in 23 languages. It is the largest open data to date for unsupervised representation learning as well as semi-supervised learning. VoxPopuli also
Externí odkaz:
http://arxiv.org/abs/2101.00390
We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation. It follows fairseq's careful design for scalability and extensibility. We provide end-to-end w
Externí odkaz:
http://arxiv.org/abs/2010.05171
Speech translation has recently become an increasingly popular topic of research, partly due to the development of benchmark datasets. Nevertheless, current datasets cover a limited number of languages. With the aim to foster research in massive mult
Externí odkaz:
http://arxiv.org/abs/2007.10310
End-to-end speech-to-text translation can provide a simpler and smaller system but is facing the challenge of data scarcity. Pre-training methods can leverage unlabeled data and have been shown to be effective on data-scarce settings. In this work, w
Externí odkaz:
http://arxiv.org/abs/2006.12124
Spoken language translation has recently witnessed a resurgence in popularity, thanks to the development of end-to-end models and the creation of new corpora, such as Augmented LibriSpeech and MuST-C. Existing datasets involve language pairs with Eng
Externí odkaz:
http://arxiv.org/abs/2002.01320