Named Entity Recognition in Statistical Dataset Search Queries

Autor: Wildannissa Pinasti, Lya Hulliyyatus Suadaa
Jazyk: English<br />Indonesian
Rok vydání: 2024
Předmět:
Zdroj: Jurnal Nasional Teknik Elektro dan Teknologi Informasi, Vol 13, Iss 3, Pp 171-177 (2024)
Druh dokumentu: article
ISSN: 2301-4156
2460-5719
DOI: 10.22146/jnteti.v13i3.11580
Popis: Search engines must understand user queries to provide relevant search results. Search engines can enhance their understanding of user intent by employing named entity recognition (NER) to identify the entity in the query. Knowing the types of entities in the query can be the initial step in helping search engines better understand search intent. In this research, a dataset was constructed using search query history from the Statistics Indonesia (Badan Pusat Statistik, BPS) website, and NER in query modeling was employed to extract entities from search queries related to statistical datasets. The research stages included query data collection, query data preprocessing, query data labeling, NER in query modeling, and model evaluation. The conditional random field (CRF) model was employed for NER in query modeling with two scenarios: CRF with basic features and CRF with basic features plus part of speech (POS) features. The CRF model was used due to its well-known effectiveness in natural language processing (NLP), particularly for tasks like NER with sequence labeling. In this research, the basic CRF and the CRF model with POS feature achieved an F1-score of 0.9139 and 0.9110, respectively. A case study on a Linked Open Data (LOD) statistical dataset indicated that searches with synonym query expansion on entities from NER in query produced better search results than regular searches without query expansion. The model's performance incorporating additional POS tagging features did not result in a significant improvement. Therefore, it is recommended that future research will elaborate on deep learning.
Databáze: Directory of Open Access Journals