Novel Character Identification Utilizing Semantic Relation with Animate Nouns in Korean

Autor:	Tae-Keun Park, Seung-Hoon Kim
Rok vydání:	2018
Předmět:	General Computer Science Recall Computer science business.industry 05 social sciences 02 engineering and technology computer.software_genre Focus (linguistics) Feature (linguistics) Identification (information) Character (mathematics) Direct speech Noun 0202 electrical engineering electronic engineering information engineering Proper noun 020201 artificial intelligence & image processing Artificial intelligence 0509 other social sciences 050904 information & library sciences business computer Natural language processing
Zdroj:	ACM Transactions on Asian and Low-Resource Language Information Processing. 17:1-17
ISSN:	2375-4702 2375-4699
DOI:	10.1145/3197657
Popis:	For identifying speakers of quoted speech or extracting social networks from literature, it is indispensable to extract character names and nominals. However, detecting proper nouns in the novels translated into or written in Korean is harder than in English because Korean does not have a capitalization feature. In addition, it is almost impossible for any proper noun dictionary to include all kinds of character names that have been created or will be created by authors. Fortunately, a previous study shows that utilizing postpositions for animate nouns is a simple and effective tool for character identification in Korean novels without a proper noun dictionary and a training corpus. In this article, we propose a character identification method utilizing the semantic relation with known animate nouns. For 80 novels in Korean, the proposed method increases the micro- and macro-average recall by 13.68% and 11.86%, respectively, while decreasing the micro-average precision by 0.28% and increasing the macro-average precision by 0.07% compared to the previous study. If we focus on characters that are responsible for more than 1% of the character name mentions in each novel, the micro- and macro-average F-measure of the proposed method are 96.98% and 97.32%, respectively.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::4cb2104696e7fe5bd315b4dfde02f52a https://doi.org/10.1145/3197657 Zobrazit plný text záznamu