Probabilistic Approaches to Overcome Content Heterogeneity in Data Integration: A Study Case in Systematic Lupus Erythematosus.

Autor: SAMPRI, Alexia, GEIFMAN, Nophar, SUEUR, Helen LE, DOHERTY, Patrick, COUCH, Philip, BRUCE, Ian, PEEK, Niels
Zdroj: Studies in Health Technology & Informatics; 2020, Vol. 270, p387-391, 5p, 2 Charts, 1 Graph
Abstrakt: Integrating data from different sources into homogeneous dataset increases the opportunities to study human health. However, disparate data collections are often heterogeneous, which complicates their integration. In this paper, we focus on the issue of content heterogeneity in data integration. Traditional approaches for resolving content heterogeneity map all source datasets to a common data model that includes only shared data items, and thus omit all items that vary between datasets. Based on an example of three datasets in Systemic Lupus Erythematosus, we describe and experimentally evaluate a probabilistic data integration approach which propagates the uncertainty resulting from content heterogeneity into statistical inference, avoiding the need to map to a common data model. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index