Unsupervised Semantic Mapping for Healthcare Data Storage Schema

Autor: Fahad Ahmed Satti, Musarrat Hussain, Jamil Hussain, Syed Imran Ali, Taqdir Ali, Hafiz Syed Muhammad Bilal, Taechoong Chung, Sungyoung Lee
Jazyk: angličtina
Rok vydání: 2021
Předmět:
Zdroj: IEEE Access, Vol 9, Pp 107267-107278 (2021)
Druh dokumentu: article
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2021.3100686
Popis: Data, information, and knowledge processing systems, in the domain of healthcare, are currently plagued by heterogeneity at various levels. Current solutions have focused on developing a standard-based, manual intervention mechanism, which requires a large number of human resources and necessitates the realignment of existing systems. State-of-the-art methodologies in the field of natural language processing and machine learning can help to partially automate this process, reducing the resource requirements and providing a relatively good multi-class-based classification algorithm. We present a novel methodology for bridging the gap between various healthcare data management solutions by leveraging the strength of transformer-based machine learning models, to create mappings between the data elements. Additionally, the annotated data, collected against five medical schemas and labeled by four annotators is made available for helping future researchers. Our results indicate, that for biased, dependent multi-class text classification, transformer-based models provide better results than linguistic and other classical models. In particular, the Robustly Optimized BERT Pretraining Approach (RoBERTa) provides the best schema matching performance by achieving a Cohen’s kappa score of 0.47 and Matthews Correlation Coefficient (MCC) score of 0.48, with human-annotated data.
Databáze: Directory of Open Access Journals