Zobrazeno 1 - 10
of 721
pro vyhledávání: '"Liu, DANNI"'
Multimodal foundation models aim to create a unified representation space that abstracts away from surface features like language syntax or modality differences. To investigate this, we study the internal representations of three recent models, analy
Externí odkaz:
http://arxiv.org/abs/2411.17666
Direct speech translation (ST) models often struggle with rare words. Incorrect translation of these words can have severe consequences, impacting translation quality and user trust. While rare word translation is inherently challenging for neural mo
Externí odkaz:
http://arxiv.org/abs/2409.09009
Participatory budgeting (PB) is a democratic approach to allocating municipal spending that has been adopted in many places in recent years, including in Chicago. Current PB voting resembles a ballot where residents are asked which municipal projects
Externí odkaz:
http://arxiv.org/abs/2407.20103
With the rise of video production and social media, speech editing has become crucial for creators to address issues like mispronunciations, missing words, or stuttering in audio recordings. This paper explores text-based speech editing methods that
Externí odkaz:
http://arxiv.org/abs/2407.17172
Autor:
Koneru, Sai, Nguyen, Thai-Binh, Pham, Ngoc-Quan, Liu, Danni, Li, Zhaolin, Waibel, Alexander, Niehues, Jan
Large Language Models (LLMs) are currently under exploration for various tasks, including Automatic Speech Recognition (ASR), Machine Translation (MT), and even End-to-End Speech Translation (ST). In this paper, we present KIT's offline submission in
Externí odkaz:
http://arxiv.org/abs/2406.16777
Autor:
Dinh, Tu Anh, Mullov, Carlos, Bärmann, Leonard, Li, Zhaolin, Liu, Danni, Reiß, Simon, Lee, Jueun, Lerzer, Nathan, Ternava, Fabian, Gao, Jianfeng, Röddiger, Tobias, Waibel, Alexander, Asfour, Tamim, Beigl, Michael, Stiefelhagen, Rainer, Dachsbacher, Carsten, Böhm, Klemens, Niehues, Jan
With the rapid development of Large Language Models (LLMs), it is crucial to have benchmarks which can evaluate the ability of LLMs on different domains. One common use of LLMs is performing tasks on scientific topics, such as writing algorithms, que
Externí odkaz:
http://arxiv.org/abs/2406.10421
Finetuning pretrained models on downstream generation tasks often leads to catastrophic forgetting in zero-shot conditions. In this work, we focus on summarization and tackle the problem through the lens of language-independent representations. After
Externí odkaz:
http://arxiv.org/abs/2404.05720
Autor:
Liu, Danni, Niehues, Jan
Customizing machine translation models to comply with desired attributes (e.g., formality or grammatical gender) is a well-studied topic. However, most current approaches rely on (semi-)supervised data with attribute annotations. This data scarcity b
Externí odkaz:
http://arxiv.org/abs/2309.08565
Autor:
Huber, Christian, Dinh, Tu Anh, Mullov, Carlos, Pham, Ngoc Quan, Nguyen, Thai Binh, Retkowski, Fabian, Constantin, Stefan, Ugan, Enes Yavuz, Liu, Danni, Li, Zhaolin, Koneru, Sai, Niehues, Jan, Waibel, Alexander
The challenge of low-latency speech translation has recently draw significant interest in the research community as shown by several publications and shared tasks. Therefore, it is essential to evaluate these different approaches in realistic scenari
Externí odkaz:
http://arxiv.org/abs/2308.03415
Autor:
Liu, Danni, Nguyen, Thai Binh, Koneru, Sai, Ugan, Enes Yavuz, Pham, Ngoc-Quan, Nguyen, Tuan-Nam, Dinh, Tu Anh, Mullov, Carlos, Waibel, Alexander, Niehues, Jan
Many existing speech translation benchmarks focus on native-English speech in high-quality recording conditions, which often do not match the conditions in real-life use-cases. In this paper, we describe our speech translation system for the multilin
Externí odkaz:
http://arxiv.org/abs/2306.05320