SMaTTS: Standard Malay Text to Speech System

Autor: Khalifa, Othman O., Zakiah Hanim Ahmad, Gunawan, Teddy Surya
Jazyk: angličtina
Rok vydání: 2007
Předmět:
DOI: 10.5281/zenodo.1079188
Popis: This paper presents a rule-based text- to- speech (TTS) Synthesis System for Standard Malay, namely SMaTTS. The proposed system using sinusoidal method and some pre- recorded wave files in generating speech for the system. The use of phone database significantly decreases the amount of computer memory space used, thus making the system very light and embeddable. The overall system was comprised of two phases the Natural Language Processing (NLP) that consisted of the high-level processing of text analysis, phonetic analysis, text normalization and morphophonemic module. The module was designed specially for SM to overcome few problems in defining the rules for SM orthography system before it can be passed to the DSP module. The second phase is the Digital Signal Processing (DSP) which operated on the low-level process of the speech waveform generation. A developed an intelligible and adequately natural sounding formant-based speech synthesis system with a light and user-friendly Graphical User Interface (GUI) is introduced. A Standard Malay Language (SM) phoneme set and an inclusive set of phone database have been constructed carefully for this phone-based speech synthesizer. By applying the generative phonology, a comprehensive letter-to-sound (LTS) rules and a pronunciation lexicon have been invented for SMaTTS. As for the evaluation tests, a set of Diagnostic Rhyme Test (DRT) word list was compiled and several experiments have been performed to evaluate the quality of the synthesized speech by analyzing the Mean Opinion Score (MOS) obtained. The overall performance of the system as well as the room for improvements was thoroughly discussed.
{"references":["Allen J., Hunnicut S., Klatt D. (1987). \"From Text To Speech, The\nMITTALK System\". Cambridge University Press, USA","Allen J., Hunnicut S., Klatt D. (1987). \"From Text To Speech, The\nMITTALK System\". Cambridge University Press, USA.","Dutoit T. (1996), \"A Short Introduction to Text-to-Speech Synthesis\".\nTTS research team, TCTS Lab., Mons, Belgium,\nhttp://tcts.fpms.ac.be/synthesis/introtts.html","Ferencz A., Zaiu D., Ferencz M., Raţiu T., Toderean G. (1989). \"A Text-\nTo-Speech System for the Romanian Language\" ,\nhttp://www.racai.ro/books/awde/ferencz.html","Klatt D.H. (1987). \"Review of Text-to-Speech Conversion for English\".\nWashington, USA,\nhttp://www.mindspring.com/~dmaxey/ssshp/dk_737a.htm","Miller C.A. (1998). \"Pronounciation Modeling in Speech Synthesis\".\nPresented to the Faculties of University of Pennsylvania in Partial\nFulfillment of the Requirements for the Degree of Doctor of\nPhilosophy, University of Pennsylvania, Pennsylvania, USA,\nhttp://citeseer.nj.nec.com/miller98pronunciation.html","Sproat R. (1998), \"Text Interpretation for TTS Synthesis\", Bell Labs.,\nMurray Hill, New Jersey, USA,\nhttp://cslu.cse.ogi.edu/HLTsurvey/ch5node5.html#SECTION53","Wolters M. (1997). \"A Diphone-Based Text-to-Speech for Scottish\nGaelic\". A Thesis Submitted in Fulfillment of the Requirements for the\nDegree of Diplom in Informatik to the University of Bonn, University of\nBonn, Bonn, Germany, http://citeseer.nj.nec.com/309369.html.","Samsudin, Nur-Hana and Kong, Tang Enya. (2004, October). A Simple\nMalay Speech Synthesizer Using Syllable Concatenation Approach,\nMMU International Symposium on Information and Communications\nTechnologies 2004 (M2USIC 2004).\n[10] Bamini, P. K. (2003). FGPA-based Implementation of Concatenative\nSpeech Synthesis Algorithm. Master thesis, Dept. of Computer Science\nand Engineering, University of South Florida\n[11] Benjamin, Nettre. (2000). Synthesis by Concatenation.for Text-to-\nSpeech. Tokyo Institute of Technology.\n[12] Bozkurt, Baris and Dutoit, Thierry. (2001). An Implementation and\nEvaluation of Two Diphone-Based Synthesizers for Turkish, Proc. 4th\nISCA Tutorial and Research Workshop on Speech Synthesis, 247-250.\n[13] Sankaranarayanan, A. (2002). A Text-Independent Approach to Speaker\nIdentification. Retrieved July 17, 2006.\nhttp://www.techonline.com/community/ed_resource/feature_article/2106\n8__JD7349406658EL\n[14] Childers, Donald G. (1999). Speech Processing and Synthesis\nToolboxes. John Wiley & Sons, New York.\n[15] Dutoit, Thierry (1993). High Quality Text-To-Speech Synthesis of the\nFrench Language. Doctorial dissertation, Faculte Polytechnique de\nMons.\n[16] Dutoit, Thierry (1997). An Introduction to Text-To-Speech Synthesis.\nKluwer Academics Publisher, The Netherlands.\n[17] Dutoit, Thierry (1999) Short Introduction to Text-To-Speech Synthesis.\nRetrieved April 16, 2005.\nhttp://tcts.fpms.ac.be/synthesis/introtts_old.html\n[18] H├ñrm├ñ, Aki and Laine, Unto K. (2001), A Comparison of Warped and\nConventional Linear Predictive Coding. IEEE Transactions on Speech\nand Audio Processing, vol. 9, 579-588.\n[19] Helander, Elina (2005). SGN-1656 Signal Processing Laboratory.\nRetrieved January 11, 2005, http://ww.cs.tut.fi/kurssit/SGN-4010/.\n[20] Howitt, Andrew Wilson (1995). Linear Predictive Coding. Retrieved\nJuly 10, 2006 http://www.otolith.com/otolith/olt/lpc.html\n[21] Klabbers, Esther A. M. (2000). Segmental and Prosodic Improvements\nto Speech Generation. PhD dissertation. Technische Universiteit\nEindhoven, The Netherlands.\n[22] Lemmetty, Sami (1999). Review of Speech Synthesis. Master thesis,\nDept. of Electrical and Communications Engineering, Helsinky\nUniversity of Technology\n[23] Laws, Mark R. (2003). Speech Data Analysis for Diphone Construction\nof a Maori Online Text- to- Speech Synthesizer, SIP 2003, 103-108\n[24] Lehana, P. K. and Pandey, P. CP.K. Lehana and P.C. Pandey (2004).\nHarmonic Plus Noise Model Based Speech Synthesis in Hindi And Pitch\nModification. Proc. 18th International Congress on Acoustics, ICA\n2004, 3333-3336\n[25] Seong, Teoh Boon. (1994). The Sound System of Malay Revisited.\nPercetakan Dewan Bahasa Dan Pustaka. Selangor,\nMalaysiaStylianou,Yannis, Dutoit,Thierry and Schroeter, Juergen.\n(1997). Diphone Concatenation Using A Harmonic Plus Noise Model Of\nSpeech. Proc. Eurospeech. 613-616.\n[26] Yi, Jon Rong-Wei. (1998). Natural-Sounding Speech Synthesis Using\nVariable-Length Units. Master thesis. Dept. of Electrical Engineering\nand Computer Science, Massachusetts Institute of Technology.\n[27] Malay Language, retrieved 2006, May.\nhttp://en.wikipedia.org/wiki/Malay_language\n[28] Kee, Tan Yeow, Seong, Teoh Boon and Haizhou, Li. (2004). Grapheme\nto Phoneme Conversion for Standard Malay."]}
Databáze: OpenAIRE