Solving the Phoneme Conflict in Grapheme-to-Phoneme Conversion Using a Two-Stage Neural Network-Based Approach
Autor: | Seng Kheang, Tsuneo Nitta, Kouichi Katsurada, Yurie Iribe |
---|---|
Rok vydání: | 2014 |
Předmět: |
Artificial neural network
Computer science business.industry Speech recognition American English Phonetic transcription Grapheme Context (language use) Speech synthesis Pronunciation computer.software_genre ComputingMethodologies_ARTIFICIALINTELLIGENCE ComputingMethodologies_PATTERNRECOGNITION Artificial Intelligence Hardware and Architecture Computer Vision and Pattern Recognition Artificial intelligence Electrical and Electronic Engineering business computer Software Word (computer architecture) Natural language processing |
Zdroj: | IEICE Transactions on Information and Systems. :901-910 |
ISSN: | 1745-1361 0916-8532 |
DOI: | 10.1587/transinf.e97.d.901 |
Popis: | SUMMARY To achieve high quality output speech synthesis systems, data-driven grapheme-to-phoneme (G2P) conversion is usually used to generate the phonetic transcription of out-of-vocabulary (OOV) words. To improve the performance of G2P conversion, this paper deals with the problem of conflicting phonemes, where an input grapheme can, in the same context, produce many possible output phonemes at the same time. To this end, we propose a two-stage neural network-based approach that converts the input text to phoneme sequences in the first stage and then predicts each output phoneme in the second stage using the phonemic information obtained. The first-stage neural network is fundamentally implemented as a many-to-many mapping model for automatic conversion of word to phoneme sequences, while the second stage uses a combination of the obtained phoneme sequences to predict the output phoneme corresponding to each input grapheme in a given word. We evaluate the performance of this approach using the American English words-based pronunciation dictionary known as the auto-aligned CMUDict corpus[1]. In terms of phoneme and word accuracy of the OOV words, on comparison with several proposed baseline approaches, the evaluation results show that our proposed approach improves on the previous one-stage neural network-based approach for G2P conversion. The results of comparison with another existing approach indicate that it provides higher phoneme accuracy but lower word accuracy on a general dataset, and slightly higher phoneme and word accuracy on a selection of words consisting of more than one phoneme |
Databáze: | OpenAIRE |
Externí odkaz: |