Combining TD-IDF with symptom features to differentiate between lymphoma and tuberculosis case reports

Autor: Chunling Du, Abdelbaset Khalaf, Moanda Diana Pholo, Yskandar Hamam
Rok vydání: 2019
Předmět:
Zdroj: GlobalSIP
DOI: 10.1109/globalsip45357.2019.8969317
Popis: In regions where tuberculosis (TB) is a high burden disease, empirical anti-TB treatment is generally recommended. However, TB can mimic a number of other diseases such as lymphoma, leading to high rates of misdiagnosis. This paper therefore suggests the use of machine learning and natural language processing techniques in the differentiation between tuberculosis and lymphoma.To conduct this study, medical case reports were collected automatically and converted into word vectors, which were augmented by adding symptoms and biographical features extracted from the case reports. Different machine learning algorithms were applied to the collected data, which was comprised of 215 TB cases, 505 lymphoma cases and 207 "other" cases. Each algorithm was evaluated based on accuracy, precision and recall. With an accuracy of up to 97.3%, and both precision and recall scores of up to 96%, logistic regression achieved best across datasets and metrics, although performing better on the augmented dataset.
Databáze: OpenAIRE