A Spanish Multispeaker Database of Esophageal Speech

Autor: Inma Hernáez Rioja, Ibon Saratxaga, Sneha Raman, Luis Serrano García, Eva Navas Cordón, Jon Sanchez
Přispěvatelé: European Commission
Rok vydání: 2020
Předmět:
Zdroj: Computer Speech & Language
Addi. Archivo Digital para la Docencia y la Investigación
instname
Addi: Archivo Digital para la Docencia y la Investigación
Universidad del País Vasco
ISSN: 0885-2308
DOI: 10.1016/j.csl.2020.101168
Popis: A laryngectomee is a person whose larynx has been removed by surgery, usually due to laryngeal cancer. After surgery, most laryngectomees are able to speak again, using techniques that are learned with the help of a speech therapist. This is termed as alaryngeal speech, and esophageal speech (ES) is one of the several alaryngeal speech production modes. A considerable amount of research has been dedicated to the study of alaryngeal speech, with a wide range of aims such as helping speech therapists with evaluation and diagnosis, and improving its quality and intelligibility using digital signal processing techniques. We present to you a database of Spanish ES voices, named AhoSLABI, which is designed to allow the development of new support technologies for this speech impairment. The database primarily consists of recordings of 31 laryngectomees (27 males and 4 females) pronouncing phonetically balanced sentences. Additionally, it includes parallel recordings of the sentences by 9 healthy speakers (6 males and 3 females) to facilitate speech processing tasks that require small parallel corpora, such as voice conversion or synthetic speech adaptation. Apart from the sentences, the database includes sustained vowels and a small set of isolated words, which can be valuable for research on ES analysis, diagnosis and evaluation. The paper describes the main contents of the database, the recording protocols and procedure, as well as the labeling process. The main acoustic characteristics of the voices, such as speaking rate, durations of the recordings, phones and silences, and other such characteristics are compared with those of a reduced set of healthy voices. In addition, we describe an experiment using the database to improve the performance of an ASR system for ES speakers. This new resource will be made available to the scientific community with the hope that it will be used to improve the quality of life of the laryngectomees. This work was partially funded by the Spanish Ministry of Economy and Competitiveness with FEDER support (RESTORE project, TEC2015-67163-C2-1-R), the Basque Government (PIBA-018-0035) and by the European Union’s H2020 research and innovation program under the Marie Curie European Training Network ENRICH (675324)
Databáze: OpenAIRE