Výsledky vyhledávání - "Subramanian, Aswin Shanmugam"

Report

Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation

Autor: Wang, Peidong, Xue, Jian, Li, Jinyu, Chen, Junkun, Subramanian, Aswin Shanmugam

Language-agnostic many-to-one end-to-end speech translation models can convert audio signals from different source languages into text in a target language. These models do not need source language identification, which improves user experience. In s

Externí odkaz: http://arxiv.org/abs/2406.10276

Zobrazit plný text záznamu

Report

TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings

Autor: Boeddeker, Christoph, Subramanian, Aswin Shanmugam, Wichern, Gordon, Haeb-Umbach, Reinhold, Roux, Jonathan Le

Since diarization and source separation of meeting data are closely related tasks, we here propose an approach to perform the two objectives jointly. It builds upon the target-speaker voice activity detection (TS-VAD) diarization approach, which assu

Externí odkaz: http://arxiv.org/abs/2303.03849

Zobrazit plný text záznamu

Report

Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks

Autor: Petermann, Darius, Wichern, Gordon, Subramanian, Aswin Shanmugam, Wang, Zhong-Qiu, Roux, Jonathan Le

Emulating the human ability to solve the cocktail party problem, i.e., focus on a source of interest in a complex acoustic scene, is a long standing goal of audio source separation research. Much of this research investigates separating speech from n

Externí odkaz: http://arxiv.org/abs/2212.07327

Zobrazit plný text záznamu

Report

Reverberation as Supervision for Speech Separation

Autor: Aralikatti, Rohith, Boeddeker, Christoph, Wichern, Gordon, Subramanian, Aswin Shanmugam, Roux, Jonathan Le

This paper proposes reverberation as supervision (RAS), a novel unsupervised loss function for single-channel reverberant speech separation. Prior methods for unsupervised separation required the synthesis of mixtures of mixtures or assumed the exist

Externí odkaz: http://arxiv.org/abs/2211.08303

Zobrazit plný text záznamu

Report

An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition

Autor: Chang, Xuankai, Maekaku, Takashi, Guo, Pengcheng, Shi, Jing, Lu, Yen-Ju, Subramanian, Aswin Shanmugam, Wang, Tianzi, Yang, Shu-wen, Tsao, Yu, Lee, Hung-yi, Watanabe, Shinji

Self-supervised pretraining on speech data has achieved a lot of progress. High-fidelity representation of the speech signal is learned from a lot of untranscribed data and shows promising performance. Recently, there are several works focusing on ev

Externí odkaz: http://arxiv.org/abs/2110.04590

Zobrazit plný text záznamu

Report

Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition

Autor: Subramanian, Aswin Shanmugam, Weng, Chao, Watanabe, Shinji, Yu, Meng, Yu, Dong

Multi-source localization is an important and challenging technique for multi-talker conversation analysis. This paper proposes a novel supervised learning method using deep neural networks to estimate the direction of arrival (DOA) of all the speake

Externí odkaz: http://arxiv.org/abs/2102.07955

Zobrazit plný text záznamu

Report

The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans

Autor: Watanabe, Shinji, Boyer, Florian, Chang, Xuankai, Guo, Pengcheng, Hayashi, Tomoki, Higuchi, Yosuke, Hori, Takaaki, Huang, Wen-Chin, Inaguma, Hirofumi, Kamo, Naoyuki, Karita, Shigeki, Li, Chenda, Shi, Jing, Subramanian, Aswin Shanmugam, Zhang, Wangyou

This paper describes the recent development of ESPnet (https://github.com/espnet/espnet), an end-to-end speech processing toolkit. This project was initiated in December 2017 to mainly deal with end-to-end speech recognition experiments based on sequ

Externí odkaz: http://arxiv.org/abs/2012.13006

Zobrazit plný text záznamu

Report

ESPnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration

Autor: Li, Chenda, Shi, Jing, Zhang, Wangyou, Subramanian, Aswin Shanmugam, Chang, Xuankai, Kamo, Naoyuki, Hira, Moto, Hayashi, Tomoki, Boeddeker, Christoph, Chen, Zhuo, Watanabe, Shinji

We present ESPnet-SE, which is designed for the quick development of speech enhancement and speech separation systems in a single framework, along with the optional downstream speech recognition module. ESPnet-SE is a new project which integrates ric

Externí odkaz: http://arxiv.org/abs/2011.03706

Zobrazit plný text záznamu

Report

Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization

Autor: Subramanian, Aswin Shanmugam, Weng, Chao, Watanabe, Shinji, Yu, Meng, Xu, Yong, Zhang, Shi-Xiong, Yu, Dong

This paper proposes a new paradigm for handling far-field multi-speaker data in an end-to-end neural network manner, called directional automatic speech recognition (D-ASR), which explicitly models source speaker locations. In D-ASR, the azimuth angl

Externí odkaz: http://arxiv.org/abs/2011.00091

Zobrazit plný text záznamu

Report

The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge

Autor: Arora, Ashish, Raj, Desh, Subramanian, Aswin Shanmugam, Li, Ke, Ben-Yair, Bar, Maciejewski, Matthew, Żelasko, Piotr, García, Paola, Watanabe, Shinji, Khudanpur, Sanjeev

This paper summarizes the JHU team's efforts in tracks 1 and 2 of the CHiME-6 challenge for distant multi-microphone conversational speech diarization and recognition in everyday home environments. We explore multi-array processing techniques at each

Externí odkaz: http://arxiv.org/abs/2006.07898

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání