Výsledky vyhledávání

Report

Speech Recognition for Analysis of Police Radio Communication

Autor: Srivastava, Tejes, Chou, Ju-Chieh, Shroff, Priyank, Livescu, Karen, Graziul, Christopher

Police departments around the world use two-way radio for coordination. These broadcast police communications (BPC) are a unique source of information about everyday police activity and emergency response. Yet BPC are not transcribed, and their natur

Externí odkaz: http://arxiv.org/abs/2409.10858

Zobrazit plný text záznamu

Report

Toward Joint Language Modeling for Speech Units and Text

Autor: Chou, Ju-Chieh, Chien, Chung-Ming, Hsu, Wei-Ning, Livescu, Karen, Babu, Arun, Conneau, Alexis, Baevski, Alexei, Auli, Michael

Speech and text are two major forms of human language. The research community has been focusing on mapping speech to text or vice versa for many years. However, in the field of language modeling, very little effort has been made to model them jointly

Externí odkaz: http://arxiv.org/abs/2310.08715

Zobrazit plný text záznamu

Report

Few-Shot Spoken Language Understanding via Joint Speech-Text Models

Autor: Chien, Chung-Ming, Zhang, Mingjiamei, Chou, Ju-Chieh, Livescu, Karen

Recent work on speech representation models jointly pre-trained with text has demonstrated the potential of improving speech representations by encoding speech and text in a shared space. In this paper, we leverage such shared representations to addr

Externí odkaz: http://arxiv.org/abs/2310.05919

Zobrazit plný text záznamu

Report

AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised Features for Audio-Visual Speech Enhancement

Autor: Chou, Ju-Chieh, Chien, Chung-Ming, Livescu, Karen

Speech enhancement systems are typically trained using pairs of clean and noisy speech. In audio-visual speech enhancement (AVSE), there is not as much ground-truth clean data available; most audio-visual datasets are collected in real-world environm

Externí odkaz: http://arxiv.org/abs/2309.08030

Zobrazit plný text záznamu

Report

Layer-wise Analysis of a Self-supervised Speech Representation Model

Autor: Pasad, Ankita, Chou, Ju-Chieh, Livescu, Karen

Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models. The utility of these learned representations has been observed empirically, but not much has been studied about the type or exte

Externí odkaz: http://arxiv.org/abs/2107.04734

Zobrazit plný text záznamu

Report

SpeechBrain: A General-Purpose Speech Toolkit

SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech processing technologies by being simple, flexible, user-friendly, and well-documented. This paper describes the co

Externí odkaz: http://arxiv.org/abs/2106.04624

Zobrazit plný text záznamu

Dissertation/ Thesis

A Comparative Policy Analysis of Quantum Computing

Autor: Chou, Ju-Chi, 周如琪

107
The first quantum revolution triggered the booming development of the optoelectronics industry and the semiconductor industry. However, according to Moore's Law, the function of traditional computers will soon reach the physical limit, and t

Externí odkaz: http://ndltd.ncl.edu.tw/handle/3w4vwd

Zobrazit plný text záznamu

Report

One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization

Autor: Chou, Ju-chieh, Yeh, Cheng-chieh, Lee, Hung-yi

Recently, voice conversion (VC) without parallel data has been successfully adapted to multi-target scenario in which a single model is trained to convert the input voice to many different speakers. However, such model suffers from the limitation tha

Externí odkaz: http://arxiv.org/abs/1904.05742

Zobrazit plný text záznamu

Report

Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences

Autor: Yeh, Cheng-chieh, Hsu, Po-chun, Chou, Ju-chieh, Lee, Hung-yi, Lee, Lin-shan

Speaking rate refers to the average number of phonemes within some unit time, while the rhythmic patterns refer to duration distributions for realizations of different phonemes within different phonetic structures. Both are key components of prosody

Externí odkaz: http://arxiv.org/abs/1808.03113

Zobrazit plný text záznamu

Report

Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations

Autor: Chou, Ju-chieh, Yeh, Cheng-chieh, Lee, Hung-yi, Lee, Lin-shan

Recently, cycle-consistent adversarial network (Cycle-GAN) has been successfully applied to voice conversion to a different speaker without parallel data, although in those approaches an individual model is needed for each target speaker. In this pap

Externí odkaz: http://arxiv.org/abs/1804.02812

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání