Výsledky vyhledávání

Report

MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models

Autor: Nguyen, Thai-Binh, Waibel, Alexander

Speaker-attributed automatic speech recognition (SA-ASR) aims to transcribe speech while assigning transcripts to the corresponding speakers accurately. Existing methods often rely on complex modular systems or require extensive fine-tuning of joint

Externí odkaz: http://arxiv.org/abs/2411.18152

Zobrazit plný text záznamu

Report

Findings of the IWSLT 2024 Evaluation Campaign

This paper reports on the shared tasks organized by the 21st IWSLT Conference. The shared tasks address 7 scientific challenges in spoken language translation: simultaneous and offline translation, automatic subtitling and dubbing, speech-to-speech t

Externí odkaz: http://arxiv.org/abs/2411.05088

Zobrazit plný text záznamu

Report

Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS

Autor: Nguyen, Tuan Nam, Akti, Seymanur, Pham, Ngoc Quan, Waibel, Alexander

Previous approaches on accent conversion (AC) mainly aimed at making non-native speech sound more native while maintaining the original content and speaker identity. However, non-native speakers sometimes have pronunciation issues, which can make it

Externí odkaz: http://arxiv.org/abs/2410.14997

Zobrazit plný text záznamu

Report

Titanic Calling: Low Bandwidth Video Conference from the Titanic Wreck

Autor: Eyiokur, Fevziye Irem, Huber, Christian, Nguyen, Thai-Binh, Nguyen, Tuan-Nam, Retkowski, Fabian, Ugan, Enes Yavuz, Yaman, Dogucan, Waibel, Alexander

In this paper, we report on communication experiments conducted in the summer of 2022 during a deep dive to the wreck of the Titanic. Radio transmission is not possible in deep sea water, and communication links rely on sonar signals. Due to the low

Externí odkaz: http://arxiv.org/abs/2410.11434

Zobrazit plný text záznamu

Report

Accent conversion using discrete units with parallel data synthesized from controllable accented TTS

Autor: Nguyen, Tuan Nam, Pham, Ngoc Quan, Waibel, Alexander

The goal of accent conversion (AC) is to convert speech accents while preserving content and speaker identity. Previous methods either required reference utterances during inference, did not preserve speaker identity well, or used one-to-one systems

Externí odkaz: http://arxiv.org/abs/2410.03734

Zobrazit plný text záznamu

Report

Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems

Autor: Zink, Oswald, Higuchi, Yosuke, Mullov, Carlos, Waibel, Alexander, Kobayashi, Tetsunori

Effective spoken dialog systems should facilitate natural interactions with quick and rhythmic timing, mirroring human communication patterns. To reduce response times, previous efforts have focused on minimizing the latency in automatic speech recog

Externí odkaz: http://arxiv.org/abs/2409.19990

Zobrazit plný text záznamu

Report

Episodic Memory Verbalization using Hierarchical Representations of Life-Long Robot Experience

Autor: Bärmann, Leonard, DeChant, Chad, Plewnia, Joana, Peller-Konrad, Fabian, Bauer, Daniel, Asfour, Tamim, Waibel, Alex

Verbalization of robot experience, i.e., summarization of and question answering about a robot's past, is a crucial ability for improving human-robot interaction. Previous works applied rule-based systems or fine-tuned deep models to verbalize short

Externí odkaz: http://arxiv.org/abs/2409.17702

Zobrazit plný text záznamu

Report

Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages

Autor: Mullov, Carlos, Pham, Ngoc-Quan, Waibel, Alexander

Multilingual neural machine translation systems learn to map sentences of different languages into a common representation space. Intuitively, with a growing number of seen languages the encoder sentence representation grows more flexible and easily

Externí odkaz: http://arxiv.org/abs/2408.02290

Zobrazit plný text záznamu

Report

Handling Numeric Expressions in Automatic Speech Recognition

Autor: Huber, Christian, Waibel, Alexander

This paper addresses the problem of correctly formatting numeric expressions in automatic speech recognition (ASR) transcripts. This is challenging since the expected transcript format depends on the context, e.g., 1945 (year) vs. 19:45 (timestamp).

Externí odkaz: http://arxiv.org/abs/2408.00004

Zobrazit plný text záznamu

Report

Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024

Autor: Koneru, Sai, Nguyen, Thai-Binh, Pham, Ngoc-Quan, Liu, Danni, Li, Zhaolin, Waibel, Alexander, Niehues, Jan

Large Language Models (LLMs) are currently under exploration for various tasks, including Automatic Speech Recognition (ASR), Machine Translation (MT), and even End-to-End Speech Translation (ST). In this paper, we present KIT's offline submission in

Externí odkaz: http://arxiv.org/abs/2406.16777

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání