Zobrazeno 1 - 10
of 5 036
pro vyhledávání: '"Gopala, Krishna"'
Autor:
Lian, Jiachen, Zhou, Xuanru, Ezzes, Zoe, Vonk, Jet, Morin, Brittany, Baquirin, David, Mille, Zachary, Tempini, Maria Luisa Gorno, Anumanchipalli, Gopala Krishna
Speech is a hierarchical collection of text, prosody, emotions, dysfluencies, etc. Automatic transcription of speech that goes beyond text (words) is an underexplored problem. We focus on transcribing speech along with non-fluencies (dysfluencies). T
Externí odkaz:
http://arxiv.org/abs/2412.00265
Effective usage of approximate circuits for various performance trade-offs requires accurate computation of error. Several average and worst case error metrics have been proposed in the literature. We propose a framework for exact computation of thes
Externí odkaz:
http://arxiv.org/abs/2411.10037
Articulatory trajectories like electromagnetic articulography (EMA) provide a low-dimensional representation of the vocal tract filter and have been used as natural, grounded features for speech synthesis. Differentiable digital signal processing (DD
Externí odkaz:
http://arxiv.org/abs/2409.02451
Autor:
Lian, Jiachen, Zhou, Xuanru, Ezzes, Zoe, Vonk, Jet, Morin, Brittany, Baquirin, David, Mille, Zachary, Tempini, Maria Luisa Gorno, Anumanchipalli, Gopala Krishna
Speech dysfluency modeling is the core module for spoken language learning, and speech therapy. However, there are three challenges. First, current state-of-the-art solutions\cite{lian2023unconstrained-udm, lian-anumanchipalli-2024-towards-hudm} suff
Externí odkaz:
http://arxiv.org/abs/2408.16221
Autor:
Zhou, Xuanru, Kashyap, Anshul, Li, Steve, Sharma, Ayati, Morin, Brittany, Baquirin, David, Vonk, Jet, Ezzes, Zoe, Miller, Zachary, Tempini, Maria Luisa Gorno, Lian, Jiachen, Anumanchipalli, Gopala Krishna
Dysfluent speech detection is the bottleneck for disordered speech analysis and spoken language learning. Current state-of-the-art models are governed by rule-based systems which lack efficiency and robustness, and are sensitive to template design. I
Externí odkaz:
http://arxiv.org/abs/2408.15297
Autor:
Wu, Peter, Kaveh, Ryan, Nautiyal, Raghav, Zhang, Christine, Guo, Albert, Kachinthaya, Anvitha, Mishra, Tavish, Yu, Bohan, Black, Alan W, Muller, Rikky, Anumanchipalli, Gopala Krishna
Electrodes for decoding speech from electromyography (EMG) are typically placed on the face, requiring adhesives that are inconvenient and skin-irritating if used regularly. We explore a different device form factor, where dry electrodes are placed a
Externí odkaz:
http://arxiv.org/abs/2407.21345
Autor:
Netzorg, Robin, Cote, Alyssa, Koshin, Sumi, Garoute, Klo Vivienne, Anumanchipalli, Gopala Krishna
As experts in voice modification, trans-feminine gender-affirming voice teachers have unique perspectives on voice that confound current understandings of speaker identity. To demonstrate this, we present the Versatile Voice Dataset (VVD), a collecti
Externí odkaz:
http://arxiv.org/abs/2407.07235
Autor:
Hammad, Omar, Rahman, Md Rezwanur, Kanugo, Gopala Krishna Vasanth, Clements, Nicholas, Miller, Shelly, Mishra, Shivakant, Sullivan, Esther
Frequent disruptions like highway constructions are common now-a-days, often impacting environmental justice communities (communities with low socio-economic status with disproportionately high and adverse human health and environmental effects) that
Externí odkaz:
http://arxiv.org/abs/2403.14038
Autor:
Lian, Jiachen, Feng, Carly, Farooqi, Naasir, Li, Steve, Kashyap, Anshul, Cho, Cheol Jun, Wu, Peter, Netzorg, Robbie, Li, Tingle, Anumanchipalli, Gopala Krishna
Dysfluent speech modeling requires time-accurate and silence-aware transcription at both the word-level and phonetic-level. However, current research in dysfluency modeling primarily focuses on either transcription or detection, and the performance o
Externí odkaz:
http://arxiv.org/abs/2312.12810
Perceptual modification of voice is an elusive goal. While non-experts can modify an image or sentence perceptually with available tools, it is not clear how to similarly modify speech along perceptual axes. Voice conversion does make it possible to
Externí odkaz:
http://arxiv.org/abs/2312.08494