Drug classification system based on drug composition and usage instructions.

Autor:	Hoang-Dieu Vu, Vu Hien Pham, Quang Dung Le
Předmět:	MACHINE learning LONG short-term memory LANGUAGE models NATURAL language processing DATA augmentation
Zdroj:	EAI Endorsed Transactions on Industrial Networks & Intelligent Systems; 2025, Vol. 12 Issue 1, p1-8, 8p
Abstrakt:	This study presents a natural language processing (NLP) approach to classify drugs based on compositional and usage descriptions. NLP techniques including text preprocessing, word embedding, and deep learning models were applied to our own collected data in Vietnam. Traditional machine learning models like Support Vector Machines (SVM) and deep models including Bidirectional Long Short-Term Memory (BiLSTM) and PhoBERT were evaluated. Besides, since there is a limitation in the information of the collected data, some data augmentation techniques were applied to increase the variation of the dataset. Results show PhoBERT achieving 95% accuracy, highlighting the benefits of transferring knowledge from large language models. Errors primarily occurred between similar drug categories, suggesting taxonomy refinement could improve performance. In summary, an automated drug classification framework was developed leveraging state-ofthe-art NLP, validating the feasibility of analyzing drug data at scale and aiding therapeutic understanding. This supports NLP’s potential in pharmacovigilance applications. [ABSTRACT FROM AUTHOR]
Databáze:	Complementary Index
Externí odkaz:	Zobrazit plný text záznamu