Automated Identification and Measurement Extraction of Pancreatic Cystic Lesions from Free-Text Radiology Reports Using Natural Language Processing.

Autor: Yamashita R; Departments of Biomedical Data Science (R.Y., D.L.R.) and Radiology (K.B., P.Y.C.C., J.H.D., M.N.F., D.G., L.N.M., A.S., A.L.W., D.L.R., T.S.D.), Stanford University School of Medicine, 300 Pasteur Dr, Stanford, CA 94305., Bird K; Departments of Biomedical Data Science (R.Y., D.L.R.) and Radiology (K.B., P.Y.C.C., J.H.D., M.N.F., D.G., L.N.M., A.S., A.L.W., D.L.R., T.S.D.), Stanford University School of Medicine, 300 Pasteur Dr, Stanford, CA 94305., Cheung PY; Departments of Biomedical Data Science (R.Y., D.L.R.) and Radiology (K.B., P.Y.C.C., J.H.D., M.N.F., D.G., L.N.M., A.S., A.L.W., D.L.R., T.S.D.), Stanford University School of Medicine, 300 Pasteur Dr, Stanford, CA 94305., Decker JH; Departments of Biomedical Data Science (R.Y., D.L.R.) and Radiology (K.B., P.Y.C.C., J.H.D., M.N.F., D.G., L.N.M., A.S., A.L.W., D.L.R., T.S.D.), Stanford University School of Medicine, 300 Pasteur Dr, Stanford, CA 94305., Flory MN; Departments of Biomedical Data Science (R.Y., D.L.R.) and Radiology (K.B., P.Y.C.C., J.H.D., M.N.F., D.G., L.N.M., A.S., A.L.W., D.L.R., T.S.D.), Stanford University School of Medicine, 300 Pasteur Dr, Stanford, CA 94305., Goff D; Departments of Biomedical Data Science (R.Y., D.L.R.) and Radiology (K.B., P.Y.C.C., J.H.D., M.N.F., D.G., L.N.M., A.S., A.L.W., D.L.R., T.S.D.), Stanford University School of Medicine, 300 Pasteur Dr, Stanford, CA 94305., Morimoto LN; Departments of Biomedical Data Science (R.Y., D.L.R.) and Radiology (K.B., P.Y.C.C., J.H.D., M.N.F., D.G., L.N.M., A.S., A.L.W., D.L.R., T.S.D.), Stanford University School of Medicine, 300 Pasteur Dr, Stanford, CA 94305., Shon A; Departments of Biomedical Data Science (R.Y., D.L.R.) and Radiology (K.B., P.Y.C.C., J.H.D., M.N.F., D.G., L.N.M., A.S., A.L.W., D.L.R., T.S.D.), Stanford University School of Medicine, 300 Pasteur Dr, Stanford, CA 94305., Wentland AL; Departments of Biomedical Data Science (R.Y., D.L.R.) and Radiology (K.B., P.Y.C.C., J.H.D., M.N.F., D.G., L.N.M., A.S., A.L.W., D.L.R., T.S.D.), Stanford University School of Medicine, 300 Pasteur Dr, Stanford, CA 94305., Rubin DL; Departments of Biomedical Data Science (R.Y., D.L.R.) and Radiology (K.B., P.Y.C.C., J.H.D., M.N.F., D.G., L.N.M., A.S., A.L.W., D.L.R., T.S.D.), Stanford University School of Medicine, 300 Pasteur Dr, Stanford, CA 94305., Desser TS; Departments of Biomedical Data Science (R.Y., D.L.R.) and Radiology (K.B., P.Y.C.C., J.H.D., M.N.F., D.G., L.N.M., A.S., A.L.W., D.L.R., T.S.D.), Stanford University School of Medicine, 300 Pasteur Dr, Stanford, CA 94305.
Jazyk: angličtina
Zdroj: Radiology. Artificial intelligence [Radiol Artif Intell] 2021 Dec 22; Vol. 4 (2), pp. e210092. Date of Electronic Publication: 2021 Dec 22 (Print Publication: 2022).
DOI: 10.1148/ryai.210092
Abstrakt: Purpose: To automatically identify a cohort of patients with pancreatic cystic lesions (PCLs) and extract PCL measurements from historical CT and MRI reports using natural language processing (NLP) and a question answering system.
Materials and Methods: Institutional review board approval was obtained for this retrospective Health Insurance Portability and Accountability Act-compliant study, and the requirement to obtain informed consent was waived. A cohort of free-text CT and MRI reports generated between January 1991 and July 2019 that covered the pancreatic region were identified. A PCL identification model was developed by modifying a rule-based information extraction model; measurement extraction was performed using a state-of-the-art question answering system. The system's performance was evaluated against radiologists' annotations.
Results: For this study, 430 426 free-text radiology reports from 199 783 unique patients were identified. The NLP model for identifying PCL was applied to 1000 test samples. The interobserver agreement between the model and two radiologists was almost perfect (Fleiss κ = 0.951), and the false-positive rate and true-positive rate were 3.0% and 98.2%, respectively, against consensus of radiologists' annotations as ground truths. The overall accuracy and Lin concordance correlation coefficient for measurement extraction were 0.958 and 0.874, respectively, against radiologists' annotations as ground truths.
Conclusion: An NLP-based system was developed that identifies patients with PCLs and extracts measurements from a large single-institution archive of free-text radiology reports. This approach may prove valuable to study the natural history and potential risks of PCLs and can be applied to many other use cases. Keywords: Informatics, Abdomen/GI, Pancreas, Cysts, Computer Applications-General (Informatics), Named Entity Recognition Supplemental material is available for this article. © RSNA, 2022See also commentary by Horii in this issue.
Competing Interests: Disclosures of Conflicts of Interest: R.Y. No relevant relationships. K.B. No relevant relationships. P.Y.C.C. No relevant relationships. J.H.D. No relevant relationships. M.N.F. No relevant relationships. D.G. No relevant relationships. L.N.M. No relevant relationships. A.S. No relevant relationships. A.L.W. No relevant relationships. D.L.R. Associate editor of Radiology: Artificial Intelligence. T.S.D. No relevant relationships.
(2022 by the Radiological Society of North America, Inc.)
Databáze: MEDLINE