Zobrazeno 1 - 10
of 59
pro vyhledávání: '"Germán Bordel"'
Publikováno v:
Applied Sciences, Vol 14, Iss 5, p 1951 (2024)
The development of speech technology requires large amounts of data to estimate the underlying models. Even when relying on large multilingual pre-trained models, some amount of task-specific data on the target language is needed to fine-tune those m
Externí odkaz:
https://doaj.org/article/b728311dc6f649c7b17ba4a86578c55f
Autor:
Eduardo Lleida, Luis Javier Rodriguez-Fuentes, Javier Tejedor, Alfonso Ortega, Antonio Miguel, Virginia Bazán, Carmen Pérez, Alberto de Prada, Mikel Penagarikano, Amparo Varona, Germán Bordel, Doroteo Torre-Toledano, Aitor Álvarez, Haritz Arzelus
Publikováno v:
Applied Sciences, Vol 13, Iss 15, p 8577 (2023)
Evaluation campaigns provide a common framework with which the progress of speech technologies can be effectively measured. The aim of this paper is to present a detailed overview of the IberSpeech-RTVE 2022 Challenges, which were organized as part o
Externí odkaz:
https://doaj.org/article/75d032e326114ebca601ea06a1a3cd04
Publikováno v:
Applied Sciences, Vol 13, Iss 14, p 8492 (2023)
In this paper, a semisupervised speech data extraction method is presented and applied to create a new dataset designed for the development of fully bilingual Automatic Speech Recognition (ASR) systems for Basque and Spanish. The dataset is drawn fro
Externí odkaz:
https://doaj.org/article/e8c099db1d79434f9d1b16b7d1d9094f
Publikováno v:
IberSPEECH 2022.
Publikováno v:
IberSPEECH
Autor:
Luis Javier Rodríguez-Fuentes, Amparo Varona, Aitor Alvarez, Germán Bordel, Mikel Peñagarikano
Publikováno v:
IEEE Signal Processing Letters. 23:126-129
The synchronization of text transcripts with audio tracks is typically solved by forced alignment at the phonetic level. However, when dealing with either very long audio tracks or acoustically inaccurate text transcripts, more complex methods are ne
Publikováno v:
Language Resources and Evaluation. 50:221-243
KALAKA-3 is a speech database specifically designed for the development and evaluation of Spoken Language Recognition (SLR) systems. The database provides TV broadcast speech for training, and audio data extracted from YouTube videos for tuning and t
Publikováno v:
IEEE Signal Processing Letters. 21:1073-1077
The so called Phone Log-Likelihood Ratio (PLLR) features have been recently introduced as a novel and effective way of retrieving acoustic-phonetic information in spoken language and speaker recognition systems. In this letter, an in-depth insight in
Publikováno v:
IEEE Signal Processing Letters. 21:649-652
In this letter, we apply Phone Log-Likelihood Ratio (PLLR) features to the task of speaker recognition. PLLRs, which are computed on the phone posterior probabilities provided by phone decoders, convey acoustic-phonetic information in a sequence of f
Publikováno v:
IEEE Transactions on Audio, Speech, and Language Processing. 19:2348-2363
Most common approaches to phonotactic language recognition deal with several independent phone decodings. These decodings are processed and scored in a fully uncoupled way, their time alignment (and the information that may be extracted from it) bein