The Symptoms and Pathogenesis Entity Recognition of TCM Medical Records Based on CRF

Autor: Fu Bin, Liu Honglan, Qin Xiaona
Rok vydání: 2015
Předmět:
Zdroj: UIC/ATC/ScalCom
Popis: TCM (Traditional Chinese Medicine) medical records are the great medical wealth of the Chinese nation. Since 1980s, China has begun to attach importance to the heritage of TCM. That how to effectively and maximize use these valuable resources is a problem for the TCM informationization. However, the dialectical information that includes the core idea of the famous doctors is still stored in the form of natural language. Obtaining structural information must rely on information extraction technology. With the development of science and technology information, the symptoms and pathogenesis entity recognition of TCM medical records is the key to build the TCM information extraction system. Conditional Random Fields (CRF) proposed by Lafferty et al in 2001, which combines the features of the Maximum Entropy Model and Hidden Markov Model. In recent years, it achieved good results in word segmentation, part of speech tagging and named entity recognition sequence labeling tasks [1]. This paper use the latest CRFidea, formulate appropriate feature template, using 500 marked TCM medical records from"11th five-year plan" medical records database to train the CRF model that suit for the symptoms and pathogenesis entity recognition. For verifying the model accuracy, we use this CRF model to label the symptoms and pathogenesis entity. After ten-fold cross validation, its symptoms entity F1 measure reached 81.53%, the pathogenesis entity F1 measure was 83.98%. Experimental results show that its performance is very high, it is suitable for the identification of TCM medical records information extraction.
Databáze: OpenAIRE