Autor: |
Wellner B (AUTHOR), Huyck M (AUTHOR), Mardis S (AUTHOR), Aberdeen J (AUTHOR), Morgan A (AUTHOR), Peshkin L (AUTHOR), Yeh A (AUTHOR), Hitzeman J (AUTHOR), Hirschman L (AUTHOR) |
Zdroj: |
Journal of the American Medical Informatics Association. Sep/Oct2007, Vol. 14 Issue 5, p564-573. 10p. |
Abstrakt: |
OBJECTIVE: This paper describes a successful approach to de-identification that was developed to participate in a recent AMIA-sponsored challenge evaluation. METHOD: Our approach focused on rapid adaptation of existing toolkits for named entity recognition using two existing toolkits, Carafe and LingPipe. RESULTS: The 'out of the box' Carafe system achieved a very good score (phrase F-measure of 0.9664) with only four hours of work to adapt it to the de-identification task. With further tuning, we were able to reduce the token-level error term by over 36% through task-specific feature engineering and the introduction of a lexicon, achieving a phrase F-measure of 0.9736. CONCLUSIONS: We were able to achieve good performance on the de-identification task by the rapid retargeting of existing toolkits. For the Carafe system, we developed a method for tuning the balance of recall vs. precision, as well as a confidence score that correlated well with the measured F-score. [ABSTRACT FROM AUTHOR] |
Databáze: |
Library, Information Science & Technology Abstracts |
Externí odkaz: |
|