Combining TD-IDF with symptom features to differentiate between lymphoma and tuberculosis case reports
Autor: | Chunling Du, Abdelbaset Khalaf, Moanda Diana Pholo, Yskandar Hamam |
---|---|
Rok vydání: | 2019 |
Předmět: |
High rate
Tuberculosis Computer science business.industry 02 engineering and technology Disease medicine.disease computer.software_genre Logistic regression Clinical decision support system Lymphoma 03 medical and health sciences 0302 clinical medicine 0202 electrical engineering electronic engineering information engineering medicine 020201 artificial intelligence & image processing 030212 general & internal medicine Artificial intelligence Precision and recall business computer Natural language processing Word (computer architecture) |
Zdroj: | GlobalSIP |
DOI: | 10.1109/globalsip45357.2019.8969317 |
Popis: | In regions where tuberculosis (TB) is a high burden disease, empirical anti-TB treatment is generally recommended. However, TB can mimic a number of other diseases such as lymphoma, leading to high rates of misdiagnosis. This paper therefore suggests the use of machine learning and natural language processing techniques in the differentiation between tuberculosis and lymphoma.To conduct this study, medical case reports were collected automatically and converted into word vectors, which were augmented by adding symptoms and biographical features extracted from the case reports. Different machine learning algorithms were applied to the collected data, which was comprised of 215 TB cases, 505 lymphoma cases and 207 "other" cases. Each algorithm was evaluated based on accuracy, precision and recall. With an accuracy of up to 97.3%, and both precision and recall scores of up to 96%, logistic regression achieved best across datasets and metrics, although performing better on the augmented dataset. |
Databáze: | OpenAIRE |
Externí odkaz: |