Measuring the intelligibility of pathological speech through subjective and objective procedures

Autor:	Xue, Wei
Jazyk:	angličtina
Rok vydání:	2023
Předmět:	visual analogue scale reliability and valdity pathological speech eGeMAPS Speech intelligibility pluricentric language objective procedure automatic speech recognition subjective procedure orthographic transcription phoneme acoustic features
DOI:	10.5281/zenodo.7764176
Popis:	What distinguishes us humans from other living organisms is our ability to use language and thus communicate more effectively and freely. Language can be conveyed by speech, writing, and sign. As speech is a powerful tool in daily communications, having impairments in speech can affect human communication due to a failure in message delivery. People with dysarthria suffer from speech impairments due to neurological diseases (e.g., parkinsonism and amyotrophic lateral sclerosis) or injuries (e.g., traumatic brain injury and thrombotic/embolic stroke). Dysarthria can cause a loss of control over the muscles used for speech, resulting in disorders in speech strength, speed, range, steadiness, and tone (Duffy, 2013, p. 3). It can reduce their intelligibility, leading to difficulties in communication. As a consequence, they may lose contact with others and eventually become isolated from social life and society. These consequences severely affect their quality of life. To alleviate such speech impairments and social impacts, speech therapy has been shown to be useful. For measuring the effectiveness of therapeutical treatments and monitoring developments, e.g., through pre- and post-therapy evaluations, it is necessary to have a clear definition and a robust operationalization of speech intelligibility. In this dissertation, intelligibility is defined in line with Hustad as “how well a speaker’s acoustic signal can be accurately recovered by a listener”. This definition implies that measuring speech intelligibility requires the participation of human listeners, and this procedure is therefore considered to be subjective. A typical implementation of subjective procedures is to conduct listening experiments in which a group of listeners are asked to assess speech intelligibility of speakers with speech impairments. The assessment of intelligibility can be performed with different measurement methods, i.e., scalar judgments and item identifications, for speech of different speech materials. Speech intelligibility can be evaluated at different granularity levels with respect to the units to be studied, such as graphemes (letters), phonemes, syllables, words, and sentences. Listeners recruited to participate in such experiments can either be expert listeners, such as speech-language therapists, or naïve listeners, such as college students. Many studies have shown that these procedures can produce reliable measures and they have been widely used in research and clinical practice. However, these studies are limited. First, the effects that different factorsin subjective procedures could have on measures of speech intelligibility have not been extensively analyzed so far. In particular, the comparison involving orthographic transcription between speech materials has been limited by the use of a typical form of transcription that allows only existing words. Furthermore, commonly used statistical analyses for reliability examination cannot handle all relevant factors in a procedure and different experimental designs. Also, the validity of speech intelligibility measures, a key question in research, has rarely been examined in the field of dysarthric speech. In addition to subjective procedures, many studies have explored the possibility of using objective procedures to measure speech intelligibility where involving human listeners is not essentially required. One objective procedure focuses on studying acoustic features of dysarthric speech. The other procedure employs more sophisticated machine learning (ML) models such as automatic speech recognition (ASR) systems. However, these studies about objective procedures have several limitations. First, studies focusing on acoustic features normally investigate the relation between acoustic features with only one specific intelligibility measure. Thus, it is worthwhile to extend previous research to different intelligibility measures since they may be influenced by different implementations of the factors. This more comprehensive exploration could also help to understand how such different measures can be used to develop easy-to-use tools in clinical practice. Second, the outcomes of studies employing ML-based models are not easy for speech-language pathologists to interpret, not to mention being used for diagnosis. Furthermore, although these studies showed high correlations with subjective measures of speech intelligibility, they require large amounts of labeled data for training models, whereas low-resource is actually one of the pain points in assessing dysarthric speech. The goal of this dissertation is to gain insights and to establish guidelines to develop valid procedures for measuring the intelligibility of pathological speech. To that end, both subjective and objective procedures were evaluated. Regarding subjective procedures, this dissertation focuses on comprehensively studying the effects of the four factors (i.e., speech materials, measurement methods, granularity levels, and listener characteristics) as well as the reliability and validity of intelligibility measures based onthe investigations in three listening experiments. These three listening experiments covered different speech materials, measurement methods, and granularity levels of intelligibility measures. Specifically, these experiments employed three types of speech materials varying in length, morphosyntactic complexity, and semantic predictability. The intelligibility measures were collected by both categories (i.e., scalar judgments and item identifications) of measurement methods - by Visual Analogue Scales (VAS) and orthographic transcriptions, respectively. For orthographic transcriptions, a novel form of transcription that allows pseudowords was proposed and compared with the typical form of transcription. Various intelligibility measures were extracted at different granularity levels, i.e., utterance, word, and subword (grapheme and phoneme). Five expert listeners were recruited to give assessments of speech intelligibility on speakers with varying severity levels of dysarthria, dysarthria type, gender, age, etc. The results of the five expert listeners in Chapters 2 through 4 were indirectly compared to eleven naïve listeners in Chapter 5, to study the effects of listener experience. Chapter 2 presents a comprehensive analysis of eight measures in the three listening experiments. Chapter 3 further studies two measures at utterance and word level, and focused on the reliability issues by applying Generalizability Theory, which has rarely been used in the field of speech intelligibility and pathology but can handle all relevant factors in experiment designs. Moreover, the usability of our novel pseudoword-allowing form of transcription was examined in depth. Chapter 4 expands the study of two types of phoneme-level measures and explored the possibility of using them to classify speakers. For the investigation of objective procedures, this dissertation focuses on acoustic correlates of intelligibility and on addressing the low-resource problem in a pluricentric language, Dutch in this case, when ASR models are used. Specifically, Chapter 5 studies a small set of features that are related to pitch, intensity, and formant frequencies. The features are extracted from both dysarthric and healthy speech, and a stepwise logistic regression model is applied to select relevant features to classify dysarthric and healthy speech. Based on the outcomes of the regression model, we calculate an acoustic-phonetic probability index and study its relation with subjective measures of intelligibility at the utterance and word level.Chapter 6 studies a larger acoustic feature set – eGeMAPS, including features related to e.g., frequency, amplitude, and spectrum, and its relation with a phoneme-level measure, i.e., Phoneme Intelligibility, in two types of speech materials. A set of temporal features is also considered to explore whether the relation between acoustic features and subjective intelligibility measures is material-dependent. Chapter 7 evaluates the contribution of resources from the dominant variety (Netherlandic Dutch) to improving the ASR models on the non-dominant variety (Flemish Dutch) in terms of predicting subjective measures of intelligibility and, for the first time, generating human-comparable transcriptions. The aim of studying the possibility of generating human-comparable transcriptions is to explore whether ASR models can, on the one hand, fully replace the role of human listeners in the assessment of intelligibility and, on the other hand, maintain the deviations of dysarthric speech so that therapists can further evaluate and use them for diagnosis. The results for subjective procedures showed clearly that all four factors (i.e., speech materials, granularity levels, measurement methods, and listener characteristics) have an impact on the measure of intelligibility. Specifically, for speech materials, the intelligibility measures generally increase when the degrees of semantic predictability increase. For granularity levels, different intelligibility measures can be used interchangeably when averaged per speaker but not when averaged per utterance. In particular, the scalar judgments through VAS are more reliable and robust in different speech materials compared to transcription-based, word-level measures. Phonemelevel measures are generally reliable and valid, indicating a successful reduction in human effort in deriving these measures in a programmatic manner. For measurement methods, our novel pseudoword-allowing form of transcription is a valuable tool for obtaining reliable measures and for reducing the impact of contextual cues. For listener characteristics, expert listeners seem to provide more reliable intelligibility measures than naïve listeners. In addition, the newly applied Generalizability Theory is presented as a valuable method for studying the reliability of intelligibility measures since it can accommodate all relevant factors in experiment designs. In order to obtain reliable measures, scalar judgments require three samples per speaker in combination with four listeners irrespective of speechmaterials, but transcription-based, word-level measures require only two samples and two listeners in word lists. The results of objective procedures showed the scalar judgments from human listeners and the acoustic-phonetic probability index seemed to complement each other in classifying dysarthric and healthy speakers. Furthermore, the eGeMAPS feature set seems to be effective for predicting Phoneme Intelligibility in dysarthric speech but not effective for healthy speech. The relation between acoustic features and intelligibility measures seems to be material-dependent, and intelligibility measures at different granularity levels are associated with different acoustic features. The results for how to address the low-resource problem of ASR models in the pluricentric context of Dutch demonstrated that using dysarthric speech resources from the dominant variety of Dutch can benefit the dysarthric speech from the non-dominant variety in terms of assessing intelligibility and generating human-comparable transcriptions. Taken together, the research in this dissertation provides insights and guidelines for developing valid procedures for measuring the intelligibility of pathological speech, which could be helpful for clinical practice and research.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::9ccb8086d6899a7a0d4f637cf08e7b2d Zobrazit plný text záznamu