Zobrazeno 1 - 10
of 219
pro vyhledávání: '"Zhou, Tianyan"'
In this report, we describe our submitted system for track 2 of the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22). We fuse a variety of good-performing models ranging from supervised models to self-supervised learning(SSL) pre-trained model
Externí odkaz:
http://arxiv.org/abs/2209.11266
Autor:
Kanda, Naoyuki, Xiao, Xiong, Wu, Jian, Zhou, Tianyan, Gaur, Yashesh, Wang, Xiaofei, Meng, Zhong, Chen, Zhuo, Yoshioka, Takuya
Speaker-attributed automatic speech recognition (SA-ASR) is a task to recognize "who spoke what" from multi-talker recordings. An SA-ASR system usually consists of multiple modules such as speech separation, speaker diarization and ASR. On the other
Externí odkaz:
http://arxiv.org/abs/2107.02852
Speech separation has been shown effective for multi-talker speech recognition. Under the ad hoc microphone array setup where the array consists of spatially distributed asynchronous microphones, additional challenges must be overcome as the geometry
Externí odkaz:
http://arxiv.org/abs/2103.02378
Autor:
Li, Chenda, Chen, Zhuo, Luo, Yi, Han, Cong, Zhou, Tianyan, Kinoshita, Keisuke, Delcroix, Marc, Watanabe, Shinji, Qian, Yanmin
The continuous speech separation (CSS) is a task to separate the speech sources from a long, partially overlapped recording, which involves a varying number of speakers. A straightforward extension of conventional utterance-level speech separation to
Externí odkaz:
http://arxiv.org/abs/2102.11634
Autor:
Han, Cong, Luo, Yi, Li, Chenda, Zhou, Tianyan, Kinoshita, Keisuke, Watanabe, Shinji, Delcroix, Marc, Erdogan, Hakan, Hershey, John R., Mesgarani, Nima, Chen, Zhuo
Leveraging additional speaker information to facilitate speech separation has received increasing attention in recent years. Recent research includes extracting target speech by using the target speaker's voice snippet and jointly separating all part
Externí odkaz:
http://arxiv.org/abs/2012.09727
Modules in all existing speech separation networks can be categorized into single-input-multi-output (SIMO) modules and single-input-single-output (SISO) modules. SIMO modules generate more outputs than input, and SISO modules keep the numbers of inp
Externí odkaz:
http://arxiv.org/abs/2011.08400
Autor:
Xiao, Xiong, Kanda, Naoyuki, Chen, Zhuo, Zhou, Tianyan, Yoshioka, Takuya, Chen, Sanyuan, Zhao, Yong, Liu, Gang, Wu, Yu, Wu, Jian, Liu, Shujie, Li, Jinyu, Gong, Yifan
This paper describes the Microsoft speaker diarization system for monaural multi-talker recordings in the wild, evaluated at the diarization track of the VoxCeleb Speaker Recognition Challenge(VoxSRC) 2020. We will first explain our system design to
Externí odkaz:
http://arxiv.org/abs/2010.11458
The ResNet-based architecture has been widely adopted to extract speaker embeddings for text-independent speaker verification systems. By introducing the residual connections to the CNN and standardizing the residual blocks, the ResNet structure is c
Externí odkaz:
http://arxiv.org/abs/2007.02480
Autor:
Kanda, Naoyuki, Gaur, Yashesh, Wang, Xiaofei, Meng, Zhong, Chen, Zhuo, Zhou, Tianyan, Yoshioka, Takuya
We propose an end-to-end speaker-attributed automatic speech recognition model that unifies speaker counting, speech recognition, and speaker identification on monaural overlapped speech. Our model is built on serialized output training (SOT) with at
Externí odkaz:
http://arxiv.org/abs/2006.10930
Autor:
Chen, Zhuo, Yoshioka, Takuya, Lu, Liang, Zhou, Tianyan, Meng, Zhong, Luo, Yi, Wu, Jian, Xiao, Xiong, Li, Jinyu
This paper describes a dataset and protocols for evaluating continuous speech separation algorithms. Most prior studies on speech separation use pre-segmented signals of artificially mixed speech utterances which are mostly \emph{fully} overlapped, a
Externí odkaz:
http://arxiv.org/abs/2001.11482