Zobrazeno 1 - 10
of 181
pro vyhledávání: '"Chou, Ju"'
Police departments around the world use two-way radio for coordination. These broadcast police communications (BPC) are a unique source of information about everyday police activity and emergency response. Yet BPC are not transcribed, and their natur
Externí odkaz:
http://arxiv.org/abs/2409.10858
Autor:
Chou, Ju-Chieh, Chien, Chung-Ming, Hsu, Wei-Ning, Livescu, Karen, Babu, Arun, Conneau, Alexis, Baevski, Alexei, Auli, Michael
Speech and text are two major forms of human language. The research community has been focusing on mapping speech to text or vice versa for many years. However, in the field of language modeling, very little effort has been made to model them jointly
Externí odkaz:
http://arxiv.org/abs/2310.08715
Recent work on speech representation models jointly pre-trained with text has demonstrated the potential of improving speech representations by encoding speech and text in a shared space. In this paper, we leverage such shared representations to addr
Externí odkaz:
http://arxiv.org/abs/2310.05919
Speech enhancement systems are typically trained using pairs of clean and noisy speech. In audio-visual speech enhancement (AVSE), there is not as much ground-truth clean data available; most audio-visual datasets are collected in real-world environm
Externí odkaz:
http://arxiv.org/abs/2309.08030
Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models. The utility of these learned representations has been observed empirically, but not much has been studied about the type or exte
Externí odkaz:
http://arxiv.org/abs/2107.04734
Autor:
Ravanelli, Mirco, Parcollet, Titouan, Plantinga, Peter, Rouhe, Aku, Cornell, Samuele, Lugosch, Loren, Subakan, Cem, Dawalatabad, Nauman, Heba, Abdelwahab, Zhong, Jianyuan, Chou, Ju-Chieh, Yeh, Sung-Lin, Fu, Szu-Wei, Liao, Chien-Feng, Rastorgueva, Elena, Grondin, François, Aris, William, Na, Hwidong, Gao, Yan, De Mori, Renato, Bengio, Yoshua
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech processing technologies by being simple, flexible, user-friendly, and well-documented. This paper describes the co
Externí odkaz:
http://arxiv.org/abs/2106.04624
Autor:
Chou, Ju-Chi, 周如琪
107
The first quantum revolution triggered the booming development of the optoelectronics industry and the semiconductor industry. However, according to Moore's Law, the function of traditional computers will soon reach the physical limit, and t
The first quantum revolution triggered the booming development of the optoelectronics industry and the semiconductor industry. However, according to Moore's Law, the function of traditional computers will soon reach the physical limit, and t
Externí odkaz:
http://ndltd.ncl.edu.tw/handle/3w4vwd
Recently, voice conversion (VC) without parallel data has been successfully adapted to multi-target scenario in which a single model is trained to convert the input voice to many different speakers. However, such model suffers from the limitation tha
Externí odkaz:
http://arxiv.org/abs/1904.05742
Speaking rate refers to the average number of phonemes within some unit time, while the rhythmic patterns refer to duration distributions for realizations of different phonemes within different phonetic structures. Both are key components of prosody
Externí odkaz:
http://arxiv.org/abs/1808.03113
Recently, cycle-consistent adversarial network (Cycle-GAN) has been successfully applied to voice conversion to a different speaker without parallel data, although in those approaches an individual model is needed for each target speaker. In this pap
Externí odkaz:
http://arxiv.org/abs/1804.02812