Zobrazeno 1 - 10
of 21
pro vyhledávání: '"Sivasankaran, Sunit"'
Autor:
Hu, Shujie, Zhou, Long, Liu, Shujie, Chen, Sanyuan, Hao, Hongkun, Pan, Jing, Liu, Xunying, Li, Jinyu, Sivasankaran, Sunit, Liu, Linquan, Wei, Furu
The recent advancements in large language models (LLMs) have revolutionized the field of natural language processing, progressively broadening their scope to multimodal perception and generation. However, effectively integrating listening capabilitie
Externí odkaz:
http://arxiv.org/abs/2404.00656
We present a cost-effective method to integrate speech into a large language model (LLM), resulting in a Contextual Speech Model with Instruction-following/in-context-learning Capabilities (COSMIC) multi-modal LLM. Using GPT-3.5, we generate Speech C
Externí odkaz:
http://arxiv.org/abs/2311.02248
Autor:
Chen, Zhuo, Kanda, Naoyuki, Wu, Jian, Wu, Yu, Wang, Xiaofei, Yoshioka, Takuya, Li, Jinyu, Sivasankaran, Sunit, Eskimez, Sefik Emre
Self-supervised learning (SSL) methods such as WavLM have shown promising speech separation (SS) results in small-scale simulation-based experiments. In this work, we extend the exploration of the SSL-based SS by massively scaling up both the pre-tra
Externí odkaz:
http://arxiv.org/abs/2211.05172
Autor:
Yang, Muqiao, Kanda, Naoyuki, Wang, Xiaofei, Wu, Jian, Sivasankaran, Sunit, Chen, Zhuo, Li, Jinyu, Yoshioka, Takuya
Multi-talker automatic speech recognition (ASR) has been studied to generate transcriptions of natural conversation including overlapping speech of multiple speakers. Due to the difficulty in acquiring real conversation data with high-quality human t
Externí odkaz:
http://arxiv.org/abs/2210.15715
Autor:
Pariente, Manuel, Cornell, Samuele, Cosentino, Joris, Sivasankaran, Sunit, Tzinis, Efthymios, Heitkaemper, Jens, Olvera, Michel, Stöter, Fabian-Robert, Hu, Mathieu, Martín-Doñas, Juan M., Ditter, David, Frank, Ariel, Deleforge, Antoine, Vincent, Emmanuel
This paper describes Asteroid, the PyTorch-based audio source separation toolkit for researchers. Inspired by the most successful neural source separation systems, it provides all neural building blocks required to build such a system. To improve rep
Externí odkaz:
http://arxiv.org/abs/2005.04132
Autor:
Sahidullah, Md, Patino, Jose, Cornell, Samuele, Yin, Ruiqing, Sivasankaran, Sunit, Bredin, Hervé, Korshunov, Pavel, Brutti, Alessio, Serizel, Romain, Vincent, Emmanuel, Evans, Nicholas, Marcel, Sébastien, Squartini, Stefano, Barras, Claude
This paper describes the speaker diarization systems developed for the Second DIHARD Speech Diarization Challenge (DIHARD II) by the Speed team. Besides describing the system, which considerably outperformed the challenge baselines, we also focus on
Externí odkaz:
http://arxiv.org/abs/1911.02388
Speech separation is the process of separating multiple speakers from an audio recording. In this work we propose to separate the sources using a Speaker LOcalization Guided Deflation (SLOGD) approach wherein we estimate the sources iteratively. In e
Externí odkaz:
http://arxiv.org/abs/1910.11131
We investigate the effect of speaker localization on the performance of speech recognition systems in a multispeaker, multichannel environment. Given the speaker location information, speech separation is performed in three stages. In the first stage
Externí odkaz:
http://arxiv.org/abs/1910.11114
Autor:
Bertin, Nancy, Camberlein, Ewen, Lebarbenchon, Romain, Vincent, Emmanuel, Sivasankaran, Sunit, Illina, Irina, Bimbot, Frédéric
Publikováno v:
In Speech Communication January 2019 106:68-78
Autor:
Yang, Muqiao, Kanda, Naoyuki, Wang, Xiaofei, Wu, Jian, Sivasankaran, Sunit, Chen, Zhuo, Li, Jinyu, Yoshioka, Takuya
Publikováno v:
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Multi-talker automatic speech recognition (ASR) has been studied to generate transcriptions of natural conversation including overlapping speech of multiple speakers. Due to the difficulty in acquiring real conversation data with high-quality human t