Zobrazeno 1 - 10
of 397
pro vyhledávání: '"García, Paola"'
Autor:
Cornell, Samuele, Park, Taejin, Huang, Steve, Boeddeker, Christoph, Chang, Xuankai, Maciejewski, Matthew, Wiesner, Matthew, Garcia, Paola, Watanabe, Shinji
This paper presents the CHiME-8 DASR challenge which carries on from the previous edition CHiME-7 DASR (C7DASR) and the past CHiME-6 challenge. It focuses on joint multi-channel distant speech recognition (DASR) and diarization with one or more, poss
Externí odkaz:
http://arxiv.org/abs/2407.16447
The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios
Autor:
Cornell, Samuele, Wiesner, Matthew, Watanabe, Shinji, Raj, Desh, Chang, Xuankai, Garcia, Paola, Maciejewski, Matthew, Masuyama, Yoshiki, Wang, Zhong-Qiu, Squartini, Stefano, Khudanpur, Sanjeev
The CHiME challenges have played a significant role in the development and evaluation of robust automatic speech recognition (ASR) systems. We introduce the CHiME-7 distant ASR (DASR) task, within the 7th CHiME challenge. This task comprises joint AS
Externí odkaz:
http://arxiv.org/abs/2306.13734
Autor:
Shi, Jiatong, Hsu, Chan-Jan, Chung, Holam, Gao, Dongji, Garcia, Paola, Watanabe, Shinji, Lee, Ann, Lee, Hung-yi
Spoken language understanding (SLU) is a task aiming to extract high-level semantics from spoken utterances. Previous works have investigated the use of speech self-supervised models and textual pre-trained models, which have shown reasonable improve
Externí odkaz:
http://arxiv.org/abs/2211.03025
Self-supervised learning (SSL) methods which learn representations of data without explicit supervision have gained popularity in speech-processing tasks, particularly for single-talker applications. However, these models often have degraded performa
Externí odkaz:
http://arxiv.org/abs/2211.00482
Autor:
Meng, Yen, Chen, Hsuan-Jui, Shi, Jiatong, Watanabe, Shinji, Garcia, Paola, Lee, Hung-yi, Tang, Hao
Compressing self-supervised models has become increasingly necessary, as self-supervised models become larger. While previous approaches have primarily focused on compressing the model size, shortening sequences is also effective in reducing the comp
Externí odkaz:
http://arxiv.org/abs/2210.07189
Due to the high performance of multi-channel speech processing, we can use the outputs from a multi-channel model as teacher labels when training a single-channel model with knowledge distillation. To the contrary, it is also known that single-channe
Externí odkaz:
http://arxiv.org/abs/2210.03459
In this paper, we present an incremental domain adaptation technique to prevent catastrophic forgetting for an end-to-end automatic speech recognition (ASR) model. Conventional approaches require extra parameters of the same size as the model for opt
Externí odkaz:
http://arxiv.org/abs/2207.00216
Publikováno v:
IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 706-720, 2023
A method to perform offline and online speaker diarization for an unlimited number of speakers is described in this paper. End-to-end neural diarization (EEND) has achieved overlap-aware speaker diarization by formulating it as a multi-label classifi
Externí odkaz:
http://arxiv.org/abs/2206.02432
Speech enhancement and separation are two fundamental tasks for robust speech processing. Speech enhancement suppresses background noise while speech separation extracts target speech from interfering speakers. Despite a great number of supervised le
Externí odkaz:
http://arxiv.org/abs/2203.07960
Autor:
Benlloch-Tinoco, Maria, Nuñez Ramírez, Jose Manuel, García, Paola, Gentile, Piergiorgio, Girón-Hernández, Joel
Publikováno v:
In Food Bioscience October 2024 61