Zobrazeno 1 - 10
of 29
pro vyhledávání: '"Subramanian, Aswin Shanmugam"'
Language-agnostic many-to-one end-to-end speech translation models can convert audio signals from different source languages into text in a target language. These models do not need source language identification, which improves user experience. In s
Externí odkaz:
http://arxiv.org/abs/2406.10276
Autor:
Boeddeker, Christoph, Subramanian, Aswin Shanmugam, Wichern, Gordon, Haeb-Umbach, Reinhold, Roux, Jonathan Le
Since diarization and source separation of meeting data are closely related tasks, we here propose an approach to perform the two objectives jointly. It builds upon the target-speaker voice activity detection (TS-VAD) diarization approach, which assu
Externí odkaz:
http://arxiv.org/abs/2303.03849
Autor:
Petermann, Darius, Wichern, Gordon, Subramanian, Aswin Shanmugam, Wang, Zhong-Qiu, Roux, Jonathan Le
Emulating the human ability to solve the cocktail party problem, i.e., focus on a source of interest in a complex acoustic scene, is a long standing goal of audio source separation research. Much of this research investigates separating speech from n
Externí odkaz:
http://arxiv.org/abs/2212.07327
Autor:
Aralikatti, Rohith, Boeddeker, Christoph, Wichern, Gordon, Subramanian, Aswin Shanmugam, Roux, Jonathan Le
This paper proposes reverberation as supervision (RAS), a novel unsupervised loss function for single-channel reverberant speech separation. Prior methods for unsupervised separation required the synthesis of mixtures of mixtures or assumed the exist
Externí odkaz:
http://arxiv.org/abs/2211.08303
Autor:
Chang, Xuankai, Maekaku, Takashi, Guo, Pengcheng, Shi, Jing, Lu, Yen-Ju, Subramanian, Aswin Shanmugam, Wang, Tianzi, Yang, Shu-wen, Tsao, Yu, Lee, Hung-yi, Watanabe, Shinji
Self-supervised pretraining on speech data has achieved a lot of progress. High-fidelity representation of the speech signal is learned from a lot of untranscribed data and shows promising performance. Recently, there are several works focusing on ev
Externí odkaz:
http://arxiv.org/abs/2110.04590
Multi-source localization is an important and challenging technique for multi-talker conversation analysis. This paper proposes a novel supervised learning method using deep neural networks to estimate the direction of arrival (DOA) of all the speake
Externí odkaz:
http://arxiv.org/abs/2102.07955
Autor:
Watanabe, Shinji, Boyer, Florian, Chang, Xuankai, Guo, Pengcheng, Hayashi, Tomoki, Higuchi, Yosuke, Hori, Takaaki, Huang, Wen-Chin, Inaguma, Hirofumi, Kamo, Naoyuki, Karita, Shigeki, Li, Chenda, Shi, Jing, Subramanian, Aswin Shanmugam, Zhang, Wangyou
This paper describes the recent development of ESPnet (https://github.com/espnet/espnet), an end-to-end speech processing toolkit. This project was initiated in December 2017 to mainly deal with end-to-end speech recognition experiments based on sequ
Externí odkaz:
http://arxiv.org/abs/2012.13006
Autor:
Li, Chenda, Shi, Jing, Zhang, Wangyou, Subramanian, Aswin Shanmugam, Chang, Xuankai, Kamo, Naoyuki, Hira, Moto, Hayashi, Tomoki, Boeddeker, Christoph, Chen, Zhuo, Watanabe, Shinji
We present ESPnet-SE, which is designed for the quick development of speech enhancement and speech separation systems in a single framework, along with the optional downstream speech recognition module. ESPnet-SE is a new project which integrates ric
Externí odkaz:
http://arxiv.org/abs/2011.03706
Autor:
Subramanian, Aswin Shanmugam, Weng, Chao, Watanabe, Shinji, Yu, Meng, Xu, Yong, Zhang, Shi-Xiong, Yu, Dong
This paper proposes a new paradigm for handling far-field multi-speaker data in an end-to-end neural network manner, called directional automatic speech recognition (D-ASR), which explicitly models source speaker locations. In D-ASR, the azimuth angl
Externí odkaz:
http://arxiv.org/abs/2011.00091
Autor:
Arora, Ashish, Raj, Desh, Subramanian, Aswin Shanmugam, Li, Ke, Ben-Yair, Bar, Maciejewski, Matthew, Żelasko, Piotr, García, Paola, Watanabe, Shinji, Khudanpur, Sanjeev
This paper summarizes the JHU team's efforts in tracks 1 and 2 of the CHiME-6 challenge for distant multi-microphone conversational speech diarization and recognition in everyday home environments. We explore multi-array processing techniques at each
Externí odkaz:
http://arxiv.org/abs/2006.07898