Popis: |
In this article we present the design and the development of a knowledge based computational linguistic tool, Mlphon for Malayalam language. Mlphon computationally models linguistic rules using finite state transducers and performs multiple functions including grapheme to phoneme (g2p) and phoneme to grapheme (p2g) conversions, syllabification, phonetic feature analysis and script grammar check. This open source software tool, released under MIT license, is developed as a one-stop solution to handle different speech related text processing tasks for automatic speech recognition, text to speech synthesis and non-speech natural language processing tasks including syllable subword based language modeling, phoneme diversity analysis and text sanity check. The tool is evaluated on a manually crafted gold standard lexicon. Mlphon performs orthographic syllabification with 99% accuracy with a syllable error rate of 0.62% on the gold standard lexicon. For grapheme to phoneme conversion task, overall phoneme recognition accuracy of 99% with a phoneme error rate of 0.55% is obtained on gold standard lexicon. Additionally an extrinsic evaluation of Mlphon is performed by employing the pronunciation lexicon created using Mlphon, in Malayalam automatic speech recognition (ASR) task. Performance analysis in terms of the computation time of lexicon creation process and the word error rate (WER) on ASR task are presented along with a comparison over other automated tools for lexicon creation. Pronunciation lexicons with more than 100k commonly used Malayalam words in phonemised and syllabified forms is created and they are published as open language resources along with this work. We also demonstrate the usage of Mlphon on different natural language processing applications - syllable subword ASR, assisted pronunciation learning, phoneme diversity analysis and text sanity check. Being a knowledge based solution with open source code, Mlphon can be adapted to other languages of similar script nature. |