Ensemble classifier with feature selection and multi-words for disease code assignment

Autor: Shao-wei Cheng, 鄭紹偉
Rok vydání: 2009
Druh dokumentu: 學位論文 ; thesis
Popis: 97
After the National Health Insurance (NHI) was executed, the Health Insurance Bureau stipulates that the hospitals have to report medical records with ICD-9-CM when applying for reimbursement of medical expense. If they don’t conform to this rule, they wouldn’t be subsidized. Especially filing incorrect codes or omitted codes, the medical reimbursement will be deleted or deducted. In determining correct ICD-9-CM corresponding with discharge summary, medical staffs has to manually check each document. This labor intensive work wastes human resources and time. In prior research, using domain knowledge to extend concepts of a document term was studied. However, those terms are limited to single-word terms and don’t contain multi-word terms. In addition, the meaning of codes is similar in subcategories under the same category, and the imbalanced data problem exists. This study focuses on keyword selection and multi-word terms expansion, and then combines the ensemble technique to enhance the performance of SVM and Bayes classifier in determining the disease code of medical documents. The experimental results proved the chi-square method could select keywords with better quality, the multi-word and extended-word can contain more information of medical documents, and the ensemble method Adaboost could improve the classification performance of Bayes classifier.
Databáze: Networked Digital Library of Theses & Dissertations