Zobrazeno 1 - 10
of 4 312
pro vyhledávání: '"Padfield, A."'
Integrating multiple generative foundation models, especially those trained on different modalities, into something greater than the sum of its parts poses significant challenges. Two key hurdles are the availability of aligned data (concepts that co
Externí odkaz:
http://arxiv.org/abs/2405.18669
Autor:
Rubenstein, Paul K., Asawaroengchai, Chulayuth, Nguyen, Duc Dung, Bapna, Ankur, Borsos, Zalán, Quitry, Félix de Chaumont, Chen, Peter, Badawy, Dalia El, Han, Wei, Kharitonov, Eugene, Muckenhirn, Hannah, Padfield, Dirk, Qin, James, Rozenberg, Danny, Sainath, Tara, Schalkwyk, Johan, Sharifi, Matt, Ramanovich, Michelle Tadmor, Tagliasacchi, Marco, Tudor, Alexandru, Velimirović, Mihajlo, Vincent, Damien, Yu, Jiahui, Wang, Yongqiang, Zayats, Vicky, Zeghidour, Neil, Zhang, Yu, Zhang, Zhishuai, Zilka, Lukas, Frank, Christian
We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture
Externí odkaz:
http://arxiv.org/abs/2306.12925
Current disfluency detection models focus on individual utterances each from a single speaker. However, numerous discontinuity phenomena in spoken conversational transcripts occur across multiple turns, hampering human readability and the performance
Externí odkaz:
http://arxiv.org/abs/2305.12029
Autor:
Padfield, Dirk, Liebling, Daniel J.
Publikováno v:
Proc. Interspeech (2021) 4613-4617
Diarization partitions an audio stream into segments based on the voices of the speakers. Real-time diarization systems that include an enrollment step should limit enrollment training samples to reduce user interaction time. Although training on a s
Externí odkaz:
http://arxiv.org/abs/2208.03393
In modern interactive speech-based systems, speech is consumed and transcribed incrementally prior to having disfluencies removed. This post-processing step is crucial for producing clean transcripts and high performance on downstream tasks (e.g. mac
Externí odkaz:
http://arxiv.org/abs/2205.00620
Autor:
Padfield, Natasha1 (AUTHOR) natasha.padfield@um.edu.mt, Camilleri, Tracey2 (AUTHOR), Fabri, Simon2 (AUTHOR), Bugeja, Marvin2 (AUTHOR), Camilleri, Kenneth1,2 (AUTHOR)
Publikováno v:
Brain-Computer Interfaces. Jun-Sep2024, Vol. 11 Issue 3, p125-142. 18p.
Automatic Speech Recognition (ASR) systems are often optimized to work best for speakers with canonical speech patterns. Unfortunately, these systems perform poorly when tested on atypical speech and heavily accented speech. It has previously been sh
Externí odkaz:
http://arxiv.org/abs/2109.06952
Autor:
Kenyum Bagra, David Kneis, Daniel Padfield, Edina Szekeres, Adela Teban-Man, Cristian Coman, Gargi Singh, Thomas U. Berendonk, Uli Klümper
Publikováno v:
mSphere, Vol 9, Iss 2 (2024)
ABSTRACT River microbial communities regularly act as the first barrier of defense against the spread of antimicrobial resistance genes (ARGs) that enter environmental microbiomes through wastewater. However, how the invasion dynamics of wastewater-b
Externí odkaz:
https://doaj.org/article/4c3bb2d9dcdf49c3a0e5d800717c8551
Neural Machine Translation (NMT) models have demonstrated strong state of the art performance on translation tasks where well-formed training and evaluation data are provided, but they remain sensitive to inputs that include errors of various types.
Externí odkaz:
http://arxiv.org/abs/2010.11132
Autor:
Young, Gareth, Milne, Hamish, Griffiths, Daniel, Padfield, Elliot, Blenkinsopp, Robert, Georgiou, Orestis
We present advancements in the design and development of in-vehicle infotainment systems that utilize gesture input and ultrasonic mid-air haptic feedback. Such systems employ state-of-the-art hand tracking technology and novel haptic feedback techno
Externí odkaz:
http://arxiv.org/abs/2005.08535