Zobrazeno 1 - 10
of 2 151
pro vyhledávání: '"Waibel P"'
Autor:
Nguyen, Thai-Binh, Waibel, Alexander
Speaker-attributed automatic speech recognition (SA-ASR) aims to transcribe speech while assigning transcripts to the corresponding speakers accurately. Existing methods often rely on complex modular systems or require extensive fine-tuning of joint
Externí odkaz:
http://arxiv.org/abs/2411.18152
Autor:
Ahmad, Ibrahim Said, Anastasopoulos, Antonios, Bojar, Ondřej, Borg, Claudia, Carpuat, Marine, Cattoni, Roldano, Cettolo, Mauro, Chen, William, Dong, Qianqian, Federico, Marcello, Haddow, Barry, Javorský, Dávid, Krubiński, Mateusz, Lam, Tsz Kin, Ma, Xutai, Mathur, Prashant, Matusov, Evgeny, Maurya, Chandresh, McCrae, John, Murray, Kenton, Nakamura, Satoshi, Negri, Matteo, Niehues, Jan, Niu, Xing, Ojha, Atul Kr., Ortega, John, Papi, Sara, Polák, Peter, Pospíšil, Adam, Pecina, Pavel, Salesky, Elizabeth, Sethiya, Nivedita, Sarkar, Balaram, Shi, Jiatong, Sikasote, Claytone, Sperber, Matthias, Stüker, Sebastian, Sudoh, Katsuhito, Thompson, Brian, Turchi, Marco, Waibel, Alex, Watanabe, Shinji, Wilken, Patrick, Zemánek, Petr, Zevallos, Rodolfo
This paper reports on the shared tasks organized by the 21st IWSLT Conference. The shared tasks address 7 scientific challenges in spoken language translation: simultaneous and offline translation, automatic subtitling and dubbing, speech-to-speech t
Externí odkaz:
http://arxiv.org/abs/2411.05088
Previous approaches on accent conversion (AC) mainly aimed at making non-native speech sound more native while maintaining the original content and speaker identity. However, non-native speakers sometimes have pronunciation issues, which can make it
Externí odkaz:
http://arxiv.org/abs/2410.14997
Autor:
Eyiokur, Fevziye Irem, Huber, Christian, Nguyen, Thai-Binh, Nguyen, Tuan-Nam, Retkowski, Fabian, Ugan, Enes Yavuz, Yaman, Dogucan, Waibel, Alexander
In this paper, we report on communication experiments conducted in the summer of 2022 during a deep dive to the wreck of the Titanic. Radio transmission is not possible in deep sea water, and communication links rely on sonar signals. Due to the low
Externí odkaz:
http://arxiv.org/abs/2410.11434
Accent conversion using discrete units with parallel data synthesized from controllable accented TTS
The goal of accent conversion (AC) is to convert speech accents while preserving content and speaker identity. Previous methods either required reference utterances during inference, did not preserve speaker identity well, or used one-to-one systems
Externí odkaz:
http://arxiv.org/abs/2410.03734
Effective spoken dialog systems should facilitate natural interactions with quick and rhythmic timing, mirroring human communication patterns. To reduce response times, previous efforts have focused on minimizing the latency in automatic speech recog
Externí odkaz:
http://arxiv.org/abs/2409.19990
Autor:
Bärmann, Leonard, DeChant, Chad, Plewnia, Joana, Peller-Konrad, Fabian, Bauer, Daniel, Asfour, Tamim, Waibel, Alex
Verbalization of robot experience, i.e., summarization of and question answering about a robot's past, is a crucial ability for improving human-robot interaction. Previous works applied rule-based systems or fine-tuned deep models to verbalize short
Externí odkaz:
http://arxiv.org/abs/2409.17702
Multilingual neural machine translation systems learn to map sentences of different languages into a common representation space. Intuitively, with a growing number of seen languages the encoder sentence representation grows more flexible and easily
Externí odkaz:
http://arxiv.org/abs/2408.02290
Autor:
Huber, Christian, Waibel, Alexander
This paper addresses the problem of correctly formatting numeric expressions in automatic speech recognition (ASR) transcripts. This is challenging since the expected transcript format depends on the context, e.g., 1945 (year) vs. 19:45 (timestamp).
Externí odkaz:
http://arxiv.org/abs/2408.00004
Autor:
Koneru, Sai, Nguyen, Thai-Binh, Pham, Ngoc-Quan, Liu, Danni, Li, Zhaolin, Waibel, Alexander, Niehues, Jan
Large Language Models (LLMs) are currently under exploration for various tasks, including Automatic Speech Recognition (ASR), Machine Translation (MT), and even End-to-End Speech Translation (ST). In this paper, we present KIT's offline submission in
Externí odkaz:
http://arxiv.org/abs/2406.16777