Fake it till you predict it: data augmentation strategies to detect initiation and termination of oncology treatment

Autor: Pohyer, Valentin, Fabre, Elizabeth, Oudard, Stéphane, Fournier, Laure, Rance, Bastien
Rok vydání: 2024
Předmět:
Druh dokumentu: Working Paper
Popis: At the hospital, the dispersion of information regarding anti-cancer treatment makes it difficult to extract. We proposed a solution capable of identifying dates, drugs and their temporal relationship within free-text oncology reports with very few manual annotations. We used pattern recognition for dates, dictionaries for drugs and transformer language models for the relationship, combined with a data augmentation strategy. Our models achieved good prediction F1-scores, reaching 0.872. The performance of models with data augmentation outperforms those of models without. By inferring such models, we can now identify and structure thousands of previously unavailable treatment events to better apprehend solutions and patient response.
Comment: Medical Informatics Europe 2025 - Intelligent health systems -- From technology to data and knowledge, European Federation for Medical Informatics, May 2025, Glasgow, United Kingdom
Databáze: arXiv