Information extraction: Evaluating named entity recognition from classical Malay documents

Autor: Nurazzah Abd Rahman, Zainab Abu Bakar, Siti Syakirah Sazali
Rok vydání: 2016
Předmět:
Zdroj: 2016 Third International Conference on Information Retrieval and Knowledge Management (CAMP).
Popis: Natural Language Processing (NLP) is an important field of research in Computer Science. NLP is the process of analyzing texts based on a set of theories and technologies, and recent studies focused more on Information Extraction (IE). In Information Extraction, there are few steps or commonly known as task to be followed, which are named entity recognition, relation detection and classification, temporal and event processing, and template filling. Recent researches in Malay languages mainly focused on newspaper articles and since this research experiment is experimenting on classical documents, there is a need to identify the best way to extract noun from existing methods. This paper proposes to conduct a research about extracting nouns from Malay classical documents. The result shows that experiment using the Noun Extraction using Morphological Rules (Verb, Adjective and Noun Affixes) that has 77.61% chances of identifying a noun to contribute to the existing Malay noun list. As there is not any existing completed Malay noun list or dictionary that can be used as a guide, the results extracted still need to be judged by the language experts.
Databáze: OpenAIRE