Applications of Support Vector Machines on Smart Phone Systems for Emotional Speech Recognition

Autor: Wernhuar Tarng, Chen, Yuan-Yuan, Chien-Lung Li, Hsie, Kun-Rong, Mingteh Chen
Jazyk: angličtina
Rok vydání: 2010
Předmět:
DOI: 10.5281/zenodo.1072525
Popis: An emotional speech recognition system for the applications on smart phones was proposed in this study to combine with 3G mobile communications and social networks to provide users and their groups with more interaction and care. This study developed a mechanism using the support vector machines (SVM) to recognize the emotions of speech such as happiness, anger, sadness and normal. The mechanism uses a hierarchical classifier to adjust the weights of acoustic features and divides various parameters into the categories of energy and frequency for training. In this study, 28 commonly used acoustic features including pitch and volume were proposed for training. In addition, a time-frequency parameter obtained by continuous wavelet transforms was also used to identify the accent and intonation in a sentence during the recognition process. The Berlin Database of Emotional Speech was used by dividing the speech into male and female data sets for training. According to the experimental results, the accuracies of male and female test sets were increased by 4.6% and 5.2% respectively after using the time-frequency parameter for classifying happy and angry emotions. For the classification of all emotions, the average accuracy, including male and female data, was 63.5% for the test set and 90.9% for the whole data set.
{"references":["Skiba, B., Johnson, M., Dillon, M. and Harrison, C., (2000). Moving in\nmobile media mode, http://www.regisoft.com/articles/lehman.pdf.","Shneiderman, B. (1992). Designing the user interface: strategies for\neffective human-computer interaction. Reading: Addison-Wesley.","Plutchik, R. (1980). A general psychoevolutionary theory of emotion. San\nDiego, CA: Academic Press.","Russell, J. A. (1980). A circumplex model of affect. Journal of Personality\nand Social Psychology, 39, 1161-1178.","Posner, J., Russell, J. A. and Peterson, B. S. (2005). A circumplex model of\naffect: an integrative approach to affective.","Yen-Kung Yang (2003). Science Development. 367, 70-73.","E. Douglas-Cowie, R. Cowie, and M. Schröder. (2000). Emotional speech:\ntowards a new generation of databases. Speech Communication, a special\nissue on Speech and Emotion, 40(1-2), 33-60.","Cover, T. M and Hart, P. E. (1967). Nearest neighbor pattern classification.\nIEEE Transactions on Information Theory, 13, 21-27.","Dimitrios Ververidis and Constantine Kotropoulos. (2006). Emotional\nspeech recognition: Resources, features and methods. Speech\nCommunication, 48 (9) 1162-1181.\n[10] Cai, L., Jiang, C., Wang, Z., Zhao, L., and Zou, C. (2003). A method\ncombining the global and time series structure features for emotion\nrecognition in speech. In Proceedings of International Conference on\nNeural Networks and Signal Processing (ICNNSP-03), 2, 904-907.\n[11] Kwon, O. W., Chan, K., Hao, J., and Lee, T. W. (2003). Emotion\nrecognition by speech signal. The Eighth European Conference on Speech\nCommunication and Technology (EUROSPEECH-03), Geneva,\nSwitzerland.\n[12] Schuller, B., Rigoll, G., and Lang, M. (2003). Hidden Markov model based\nspeech emotion recognition. 28th IEEE International Conference on\nAcoustic, Speech and Signal Processing (ICASSP-03).\n[13] Vogt, T. and Andr'e, E. (2006). Improving automatic emotion recognition\nfrom speech via gender differentiation. Language Resources and\nEvaluation Conference.\n[14] Petrushin, V. A. (2004). Emotion recognition in speech signal:\nexperimental study, development, and application.\" Sixth International\nConference on Spoken Language Processing (ICSLP).\n[15] Reynolds, D. A. and Rose, R. C. (1995) .Robust text-independent speaker\nidentification using Gaussian mixture models. In Proceedings of the\nEuropean Conference on Speech Communication and Technology,\n963-966.\n[16] Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected\napplications in speech recognition. Proceedings of the IEEE, 77, 257-286.\n[17] K. Fukunaga. (1990). Introduction to statistical pattern recognition. San\nDiego, CA: Academic Press.\n[18] Cover, T. M and Hart, P. E. (1967).Nearest neighbor pattern classification.\nIEEE Transactions on Information Theory, 13, 21-27.\n[19] E. H. Han, G. Karypis and V. Kumar. (2001). Text categorization using\nweight adjusted k-nearest neighbor classification. Pacific-Asia Conference\non Knowledge Discovery and Data Mining, 53-65.\n[20] Rabiner, L. R. and Ronald W. Schafer. (1989). Digital processing of speech\nsignals. Prentice-Hall, Inc., Englewood Cliffs, NJ.\n[21] Yao X. (1999). Evolving artificial neural networks. Proceedings of the\nIEEE , 87(9), 1423-1447.\n[22] V. N. Vapnik. (2000).The nature of statistical learning theory. Chapter 5-6,\n138-167, Springer-Verlag, New York.\n[23] C. C. Chang and C. J. Lin (2001). LIBSVM: a library for support vector\nmachines. http://www.csie.ntu.edu.tw/~cjlin/libsvm."]}
Databáze: OpenAIRE