Errors on a Speech-in-Babble Sentence Recognition Test Reveal Individual Differences in Acoustic Phonetic Perception and Babble Misallocations
Autor: | Edward T. Auer, Silvio P. Eberhardt, Lynne E. Bernstein |
---|---|
Rok vydání: | 2021 |
Předmět: |
Adult
Computer science Speech recognition media_common.quotation_subject Individuality Stimulus (physiology) 01 natural sciences Article 03 medical and health sciences Speech and Hearing 0302 clinical medicine Hearing Phonetics Perception 0103 physical sciences Humans Speech Active listening 030223 otorhinolaryngology 010301 acoustics Connected speech media_common Filling-in Acoustics Regression Noise Otorhinolaryngology Speech Perception Sentence |
Zdroj: | Ear Hear |
ISSN: | 1538-4667 |
Popis: | OBJECTIVES. The ability to recognize words in connected speech under noisy listening conditions is critical to everyday communication. Many processing levels contribute to the individual listener’s ability to recognize words correctly against background speech, and there is clinical need for measures of individual differences at different levels. Typical listening tests of speech recognition in noise require a list of items to obtain a single threshold score. Diverse abilities measures could be obtained through mining various open set recognition errors during multi-item tests. This study sought to demonstrate that an error mining approach using open set responses from a clinical sentence-in-babble-noise test can be used to characterize abilities beyond SNR threshold. A stimulus-response phoneme-to-phoneme sequence alignment software system was used to achieve automatic, accurate quantitative error scores. The method was applied to a database of responses from normal-hearing (NH) adults. Relationships between two types of response errors and words correct scores were evaluated through use of mixed models regression. DESIGN. Two-hundred thirty-three NH adults completed three lists of the Quick Speech in Noise Test (QSIN) [Killion, et al., 2004. Development of a quick speech-in-noise test for measuring signal-to-noise ratio loss in normal-hearing and hearing-impaired listeners. J Acoust Soc Am, 116, 2395–2405]. Their individual open set speech recognition responses were automatically phonemically transcribed and submitted to a phoneme-to-phoneme stimulus-response sequence alignment system. The computed alignments were mined for a measure of acoustic phonetic perception, a measure of response text that could not be attributed to the stimulus, and a count of words correct. The mined data were statistically analyzed to determine whether the response errors were significant factors beyond stimulus signal-to-noise ratio (SNR) in accounting for the number of words correct per response from each participant. This study addressed two hypotheses: (1) Individuals whose perceptual errors are less severe recognize more words correctly under difficult listening conditions due to babble masking; and (2) Listeners who are better able to exclude incorrect speech information such as from background babble and filling in recognize more stimulus words correctly. RESULTS. Statistical analyses showed that acoustic phonetic accuracy and exclusion of babble background were significant factors, beyond the stimulus sentence signal-to-noise ratio (SNR), in accounting for the number of words a participant recognized. There was also evidence that poorer acoustic phonetic accuracy could occur along with higher words correct scores. This paradoxical result came from a subset of listeners who had also performed subjective accuracy judgments. Their results suggested that they recognized more words while also misallocating acoustic cues from the background into the stimulus, without realizing their errors. Because the QSIN stimuli are locked to their own babble sample, misallocations of whole words from babble into the responses could be investigated in detail. The high rate of common misallocation errors for some sentences supported the view that the functional stimulus was the combination of the target sentence and its babble. CONCLUSIONS. Individual differences among NH listeners arise both in terms of words accurately identified and errors committed during open set recognition of sentences in babble maskers. Error mining to characterize individual listeners can be done automatically at the levels of acoustic phonetic perception and the misallocation of background babble words into open set responses. Error mining can increase test information and the efficiency and accuracy of characterizing individual listeners. |
Databáze: | OpenAIRE |
Externí odkaz: |