Hybrid method for modeless Japanese input using N-gram based binary classification and dictionary

Autor:	Yukino Ikegami, Setsuo Tsuruta
Rok vydání:	2014
Předmět:	Computer Networks and Communications Computer science business.industry Character (computing) Speech recognition Kana computer.software_genre Support vector machine n-gram Binary classification Hardware and Architecture Media Technology Latin alphabet Input method Artificial intelligence Alphabet business computer Software Word (computer architecture) Natural language processing
Zdroj:	Multimedia Tools and Applications. 74:3933-3946
ISSN:	1573-7721 1380-7501
DOI:	10.1007/s11042-013-1805-1
Popis:	The rapid growth of globalization requires handling a large number of multilingual documents, where Japanese input co-exist with English and other languages, which use the Roman alphabet. Conventional methods for Japanese input require Japanese users to switch the input mode between Japanese and the Latin alphabet. As current solution, there is a modeless Japanese input method that automatically switches the input mode. However, those need training with a large amount of text data for improving the performance. This paper proposes a hybrid modeless Japanese input method that is based on the non-Japanese word dictionary and n-gram character sequence features to decide whether to convert and switch to Kana input or not. The aim of using the non-Japanese word dictionary is decreasing false positive against non-Japanese language words. This dictionary is composed by text data available on the Web. The n-gram based discriminative model are learned by a Support Vector Machine from a balanced corpus, which contains various domain texts. The evaluation of our method has shown that its statistical accuracy according to F-measure for prediction of non-Kana characters improves 7.7 % compared to n-gram only based method. In addition, the real user test has shown the average value of inputted time was agreeside for our method, against disagree side for conventional Japanese input method that requires switching input mode.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::85dd10d39ca9ca9dfad5c459b9718fa2 https://doi.org/10.1007/s11042-013-1805-1 Zobrazit plný text záznamu Full text from SpringerLink