Abstrakt: |
The increased use of health consultation platforms since the pandemic has led to a higher demand for active doctors to conduct consultations. In Indonesia, the average number of doctors is 0.4 doctors per thousand people, far fewer than in developed countries. A solution that can be utilized is the implementation of Natural Language Processing (NLP) and Artificial Intelligence (AI) technologies. These technologies can be used to reduce costs, provide alternative suggestions from the database, deliver appropriate answers, and enable users to find solutions corresponding to their problems. This can be automated using Named Entity Recognition (NER). NER is a part of information extraction used to identify entities in the medical domain, such as anatomical entities, proteins, and genes. The challenge faced in implementing this solution is the lack of Indonesian language datasets for NER that are relevant to the context of health consultation platforms. Therefore, the development of a medical field dataset in the Indonesian language is necessary. In the execution of this research, data used was taken from online health consultation platforms where the Q&A sections with doctors are freely accessible. The data was manually labeled under the supervision of experts. The data was trained using the Bidirectional-LSTM-CRF model and resulted in an accuracy of 0.9968. There is a state-of-the-art model, XLM-RoBERTa-large-indonesian-NER, which after fine-tuning, achieved an accuracy of 0.9851. However, using the F1 score metric, the XLM-RoBERTa model achieved the highest score for each tag compared to the Bidirectional-LSTM-CRF model. [ABSTRACT FROM AUTHOR] |