Rapidly retargetable approaches to de-identification in medical records.

Autor: Wellner B (AUTHOR), Huyck M (AUTHOR), Mardis S (AUTHOR), Aberdeen J (AUTHOR), Morgan A (AUTHOR), Peshkin L (AUTHOR), Yeh A (AUTHOR), Hitzeman J (AUTHOR), Hirschman L (AUTHOR)
Zdroj: Journal of the American Medical Informatics Association. Sep/Oct2007, Vol. 14 Issue 5, p564-573. 10p.
Abstrakt: OBJECTIVE: This paper describes a successful approach to de-identification that was developed to participate in a recent AMIA-sponsored challenge evaluation. METHOD: Our approach focused on rapid adaptation of existing toolkits for named entity recognition using two existing toolkits, Carafe and LingPipe. RESULTS: The 'out of the box' Carafe system achieved a very good score (phrase F-measure of 0.9664) with only four hours of work to adapt it to the de-identification task. With further tuning, we were able to reduce the token-level error term by over 36% through task-specific feature engineering and the introduction of a lexicon, achieving a phrase F-measure of 0.9736. CONCLUSIONS: We were able to achieve good performance on the de-identification task by the rapid retargeting of existing toolkits. For the Carafe system, we developed a method for tuning the balance of recall vs. precision, as well as a confidence score that correlated well with the measured F-score. [ABSTRACT FROM AUTHOR]
Databáze: Library, Information Science & Technology Abstracts