Curating Archival Anatomic Pathology Material For Machine Learning Algorithm Development
Autor: | Joaquin J. Garcia, Andrea R. Collins, Andrew P. Norgan |
---|---|
Rok vydání: | 2020 |
Předmět: | |
Zdroj: | American Journal of Clinical Pathology. 154:S124-S125 |
ISSN: | 1943-7722 0002-9173 |
Popis: | Introduction/Objective Advances in whole slide imaging have enabled the application of machine learning algorithms to anatomic pathology. In the current state, the development of accurate algorithms requires robust training data with correctly assigned diagnostic and classification labels. Increasingly, institutions have looked to their archival slides as a source of “ground truth” for algorithm development. However, the curation and use of archival data poses several challenges. Here, we share lessons learned from reviewing head and neck pathology consult cases spanning a 10- year period at Mayo Clinic Rochester. Methods Archived surgical pathology slides from 2,590 consult cases were reviewed. Clinical and demographic information was recorded for each case, including surgical date, surgical procedure, anatomic site, age, gender and diagnosis. Cases were excluded from the curated archive if there was insufficient volume or quality of tissue to render a specific diagnosis (141 cases, 5.6%). Slides with a range of tissue size and quality, from numerable laboratories were included in the curated archive. Selected cases were collated by anatomic site: ear, gnathic, larynx, nasopharynx, neck, oral cavity, oropharynx, salivary gland and sinonasal tract. Results Common diagnostic reconciliations (115 cases, 4.4%) fell within the following categories: (1) novel entities (59 cases, 2.3%), including biphenotypic sinonasal sarcoma and clear cell carcinoma; (2) novel classifications (21 cases, 0.8%), as seen in HPV-related oropharyngeal squamous cell carcinoma and polymorphous adenocarcinoma; and (3) novel grading schema (35 cases, 1.4%), as seen in keratinizing dysplasia and oropharyngeal malignancies. Conclusion Several nuances emerged in the process of reviewing slides, highlighting the need for continual amendment of any machine learning dataset over time. Curating anatomic pathology cases for machine learning algorithm development requires the recognition of emerging entities, with re-classification and re-grading as needed. |
Databáze: | OpenAIRE |
Externí odkaz: |