Zobrazeno 1 - 10
of 189
pro vyhledávání: '"Vu Ngoc Thang"'
Autor:
Väth, Dirk, Vu, Ngoc Thang
In sensitive domains, such as legal or medial domains, the correctness of information given to users is critical. To address this, the recently introduced task Conversational Tree Search (CTS) provides a graph-based framework for controllable task-or
Externí odkaz:
http://arxiv.org/abs/2410.05821
Traditional speech enhancement methods often oversimplify the task of restoration by focusing on a single type of distortion. Generative models that handle multiple distortions frequently struggle with phone reconstruction and high-frequency harmonic
Externí odkaz:
http://arxiv.org/abs/2409.11145
Mental models play an important role in whether user interaction with intelligent systems, such as dialog systems is successful or not. Adaptive dialog systems present the opportunity to align a dialog agent's behavior with heterogeneous user expecta
Externí odkaz:
http://arxiv.org/abs/2408.14154
Dual encoder architectures like CLIP models map two types of inputs into a shared embedding space and learn similarities between them. However, it is not understood how such models compare two inputs. Here, we address this research gap with two contr
Externí odkaz:
http://arxiv.org/abs/2408.14153
Autor:
Li, Chia-Yu, Vu, Ngoc Thang
Training a semi-supervised end-to-end speech recognition system using noisy student training has significantly improved performance. However, this approach requires a substantial amount of paired speech-text and unlabeled speech, which is costly for
Externí odkaz:
http://arxiv.org/abs/2407.21061
Publikováno v:
Proc. Interspeech 2024, pp. 4448-4452
In speaker anonymization, speech recordings are modified in a way that the identity of the speaker remains hidden. While this technology could help to protect the privacy of individuals around the globe, current research restricts this by focusing al
Externí odkaz:
http://arxiv.org/abs/2407.02937
In recent years, prompting has quickly become one of the standard ways of steering the outputs of generative machine learning models, due to its intuitive use of natural language. In this work, we propose a system conditioned on embeddings derived fr
Externí odkaz:
http://arxiv.org/abs/2406.06406
Autor:
Lux, Florian, Meyer, Sarina, Behringer, Lyonel, Zalkow, Frank, Do, Phat, Coler, Matt, Habets, Emanuël A. P., Vu, Ngoc Thang
In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development. By leveraging a novel
Externí odkaz:
http://arxiv.org/abs/2406.06403
Although language models (LMs) have boosted the performance of Question Answering, they still need plenty of data. Data annotation, in contrast, is a time-consuming process. This especially applies to Question Answering, where possibly large document
Externí odkaz:
http://arxiv.org/abs/2405.09335
Autor:
Denisov, Pavel, Vu, Ngoc Thang
Recent advancements in language modeling have led to the emergence of Large Language Models (LLMs) capable of various natural language processing tasks. Despite their success in text-based tasks, applying LLMs to the speech domain remains limited and
Externí odkaz:
http://arxiv.org/abs/2404.10922