Abstrakt: |
This study presents a natural language processing (NLP) approach to classify drugs based on compositional and usage descriptions. NLP techniques including text preprocessing, word embedding, and deep learning models were applied to our own collected data in Vietnam. Traditional machine learning models like Support Vector Machines (SVM) and deep models including Bidirectional Long Short-Term Memory (BiLSTM) and PhoBERT were evaluated. Besides, since there is a limitation in the information of the collected data, some data augmentation techniques were applied to increase the variation of the dataset. Results show PhoBERT achieving 95% accuracy, highlighting the benefits of transferring knowledge from large language models. Errors primarily occurred between similar drug categories, suggesting taxonomy refinement could improve performance. In summary, an automated drug classification framework was developed leveraging state-ofthe-art NLP, validating the feasibility of analyzing drug data at scale and aiding therapeutic understanding. This supports NLP’s potential in pharmacovigilance applications. [ABSTRACT FROM AUTHOR] |