Discovering of Personal Name Prefix Patterns in Thai Researcher Corpus and Its Application
Autor: | Nongnuch Ketui, Nattapong Tongtep, Thanaruk Theeramunkong |
---|---|
Rok vydání: | 2020 |
Předmět: |
Computer science
business.industry Information technology 02 engineering and technology Python (programming language) computer.software_genre Prefix Information extraction 020204 information systems 0202 electrical engineering electronic engineering information engineering Preprocessor 020201 artificial intelligence & image processing Personal name Artificial intelligence business computer Natural language processing computer.programming_language |
Zdroj: | 2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON). |
DOI: | 10.1109/ecti-con49241.2020.9158214 |
Popis: | In the context of information extraction, a person’s name is one of the important named entities to be extracted which are applied to the question-answering and summarizing tasks. However, the boundary of a person’s name is still ambiguous since there are several writing patterns of a person’s name from online public data sources such as news, events, and researcher corpora. To extract, identify, and unify the person’s name, discovering the name prefix can be applied as clue words or phrases to such processes. In this paper, the name prefix discovering framework is proposed for collecting the integrated researcher corpus from various data sources and extracting name prefix patterns. Four main functions of the proposed framework are collecting data from data sources, tagging entities, preprocessing the researcher’s names, and finding the pattern of the personal name prefix. In this work, six data sources are gathered and ten entities related to the research domain are focused. The preprocessing data uses three sub-processes to provide the researcher’s name. The result shows that the 408 personal name prefixes are extracted. Moreover, the API development for extracting a person or researcher’s name is implemented using a Flask Python framework. The output of this work can be used to support the researcher’s name identification from the integrated researcher corpus. |
Databáze: | OpenAIRE |
Externí odkaz: |