Zobrazeno 1 - 10
of 8 732
pro vyhledávání: '"Gopala AS"'
Autor:
Wu, Peter, Yu, Bohan, Scheck, Kevin, Black, Alan W, Krishnapriyan, Aditi S., Chen, Irene Y., Schultz, Tanja, Watanabe, Shinji, Anumanchipalli, Gopala K.
The amount of articulatory data available for training deep learning models is much less compared to acoustic speech data. In order to improve articulatory-to-acoustic synthesis performance in these low-resource settings, we propose a multimodal pre-
Externí odkaz:
http://arxiv.org/abs/2412.13387
Autor:
Lian, Jiachen, Zhou, Xuanru, Ezzes, Zoe, Vonk, Jet, Morin, Brittany, Baquirin, David, Mille, Zachary, Tempini, Maria Luisa Gorno, Anumanchipalli, Gopala Krishna
Speech is a hierarchical collection of text, prosody, emotions, dysfluencies, etc. Automatic transcription of speech that goes beyond text (words) is an underexplored problem. We focus on transcribing speech along with non-fluencies (dysfluencies). T
Externí odkaz:
http://arxiv.org/abs/2412.00265
Effective usage of approximate circuits for various performance trade-offs requires accurate computation of error. Several average and worst case error metrics have been proposed in the literature. We propose a framework for exact computation of thes
Externí odkaz:
http://arxiv.org/abs/2411.10037
Autor:
Cho, Cheol Jun, Lee, Nicholas, Gupta, Akshat, Agarwal, Dhruv, Chen, Ethan, Black, Alan W, Anumanchipalli, Gopala K.
Syllables are compositional units of spoken language that play a crucial role in human speech perception and production. However, current neural speech representations lack structure, resulting in dense token sequences that are costly to process. To
Externí odkaz:
http://arxiv.org/abs/2410.07168
Autor:
Mittu, Fazal, Bu, Yihuan, Gupta, Akshat, Devireddy, Ashok, Ozdarendeli, Alp Eren, Singh, Anant, Anumanchipalli, Gopala
While the language modeling objective has been shown to be deeply connected with compression, it is surprising that modern LLMs are not employed in practical text compression systems. In this paper, we provide an in-depth analysis of neural network a
Externí odkaz:
http://arxiv.org/abs/2409.17141
Speech sounds convey a great deal of information about the scenes, resulting in a variety of effects ranging from reverberation to additional ambient sounds. In this paper, we manipulate input speech to sound as though it was recorded within a differ
Externí odkaz:
http://arxiv.org/abs/2409.14340
Autor:
Zhou, Xuanru, Lian, Jiachen, Cho, Cheol Jun, Liu, Jingwen, Ye, Zongli, Zhang, Jinming, Morin, Brittany, Baquirin, David, Vonk, Jet, Ezzes, Zoe, Miller, Zachary, Tempini, Maria Luisa Gorno, Anumanchipalli, Gopala
Speech dysfluency modeling is a task to detect dysfluencies in speech, such as repetition, block, insertion, replacement, and deletion. Most recent advancements treat this problem as a time-based object detection problem. In this work, we revisit thi
Externí odkaz:
http://arxiv.org/abs/2409.13582
Layer normalization is a pivotal step in the transformer architecture. This paper delves into the less explored geometric implications of this process, examining how LayerNorm influences the norm and orientation of hidden vectors in the representatio
Externí odkaz:
http://arxiv.org/abs/2409.12951
Autor:
Zhou, Xuanru, Cho, Cheol Jun, Sharma, Ayati, Morin, Brittany, Baquirin, David, Vonk, Jet, Ezzes, Zoe, Miller, Zachary, Tee, Boon Lead, Tempini, Maria Luisa Gorno, Lian, Jiachen, Anumanchipalli, Gopala
Current de-facto dysfluency modeling methods utilize template matching algorithms which are not generalizable to out-of-domain real-world dysfluencies across languages, and are not scalable with increasing amounts of training data. To handle these pr
Externí odkaz:
http://arxiv.org/abs/2409.09621
Articulatory trajectories like electromagnetic articulography (EMA) provide a low-dimensional representation of the vocal tract filter and have been used as natural, grounded features for speech synthesis. Differentiable digital signal processing (DD
Externí odkaz:
http://arxiv.org/abs/2409.02451