Výsledky vyhledávání - "Sivasankaran, Sunit"

Report

WavLLM: Towards Robust and Adaptive Speech Large Language Model

Autor: Hu, Shujie, Zhou, Long, Liu, Shujie, Chen, Sanyuan, Hao, Hongkun, Pan, Jing, Liu, Xunying, Li, Jinyu, Sivasankaran, Sunit, Liu, Linquan, Wei, Furu

The recent advancements in large language models (LLMs) have revolutionized the field of natural language processing, progressively broadening their scope to multimodal perception and generation. However, effectively integrating listening capabilitie

Externí odkaz: http://arxiv.org/abs/2404.00656

Zobrazit plný text záznamu

Report

COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning

Autor: Pan, Jing, Wu, Jian, Gaur, Yashesh, Sivasankaran, Sunit, Chen, Zhuo, Liu, Shujie, Li, Jinyu

We present a cost-effective method to integrate speech into a large language model (LLM), resulting in a Contextual Speech Model with Instruction-following/in-context-learning Capabilities (COSMIC) multi-modal LLM. Using GPT-3.5, we generate Speech C

Externí odkaz: http://arxiv.org/abs/2311.02248

Zobrazit plný text záznamu

Report

Speech separation with large-scale self-supervised learning

Autor: Chen, Zhuo, Kanda, Naoyuki, Wu, Jian, Wu, Yu, Wang, Xiaofei, Yoshioka, Takuya, Li, Jinyu, Sivasankaran, Sunit, Eskimez, Sefik Emre

Self-supervised learning (SSL) methods such as WavLM have shown promising speech separation (SS) results in small-scale simulation-based experiments. In this work, we extend the exploration of the SSL-based SS by massively scaling up both the pre-tra

Externí odkaz: http://arxiv.org/abs/2211.05172

Zobrazit plný text záznamu

Report

Simulating realistic speech overlaps improves multi-talker ASR

Autor: Yang, Muqiao, Kanda, Naoyuki, Wang, Xiaofei, Wu, Jian, Sivasankaran, Sunit, Chen, Zhuo, Li, Jinyu, Yoshioka, Takuya

Multi-talker automatic speech recognition (ASR) has been studied to generate transcriptions of natural conversation including overlapping speech of multiple speakers. Due to the difficulty in acquiring real conversation data with high-quality human t

Externí odkaz: http://arxiv.org/abs/2210.15715

Zobrazit plný text záznamu

Report

Asteroid: the PyTorch-based audio source separation toolkit for researchers

Autor: Pariente, Manuel, Cornell, Samuele, Cosentino, Joris, Sivasankaran, Sunit, Tzinis, Efthymios, Heitkaemper, Jens, Olvera, Michel, Stöter, Fabian-Robert, Hu, Mathieu, Martín-Doñas, Juan M., Ditter, David, Frank, Ariel, Deleforge, Antoine, Vincent, Emmanuel

This paper describes Asteroid, the PyTorch-based audio source separation toolkit for researchers. Inspired by the most successful neural source separation systems, it provides all neural building blocks required to build such a system. To improve rep

Externí odkaz: http://arxiv.org/abs/2005.04132

Zobrazit plný text záznamu

Report

The Speed Submission to DIHARD II: Contributions & Lessons Learned

Autor: Sahidullah, Md, Patino, Jose, Cornell, Samuele, Yin, Ruiqing, Sivasankaran, Sunit, Bredin, Hervé, Korshunov, Pavel, Brutti, Alessio, Serizel, Romain, Vincent, Emmanuel, Evans, Nicholas, Marcel, Sébastien, Squartini, Stefano, Barras, Claude

This paper describes the speaker diarization systems developed for the Second DIHARD Speech Diarization Challenge (DIHARD II) by the Speed team. Besides describing the system, which considerably outperformed the challenge baselines, we also focus on

Externí odkaz: http://arxiv.org/abs/1911.02388

Zobrazit plný text záznamu

Report

SLOGD: Speaker LOcation Guided Deflation approach to speech separation

Autor: Sivasankaran, Sunit, Vincent, Emmanuel, Fohr, Dominique

Speech separation is the process of separating multiple speakers from an audio recording. In this work we propose to separate the sources using a Speaker LOcalization Guided Deflation (SLOGD) approach wherein we estimate the sources iteratively. In e

Externí odkaz: http://arxiv.org/abs/1910.11131

Zobrazit plný text záznamu

Report

Analyzing the impact of speaker localization errors on speech separation for automatic speech recognition

Autor: Sivasankaran, Sunit, Vincent, Emmaneul, Fohr, Dominique

We investigate the effect of speaker localization on the performance of speech recognition systems in a multispeaker, multichannel environment. Given the speaker location information, speech separation is performed in three stages. In the first stage

Externí odkaz: http://arxiv.org/abs/1910.11114

Zobrazit plný text záznamu