Patterns of differential expression by association in omic data using a new measure based on ensemble learning.

Autor: Arevalillo JM; UC3M-Santander Big Data Institute, Madrid Street 135, 28903, Getafe, Madrid, Spain.; Department of Statistics and Operational Research, UNED, Juan del Rosal 10, 28040, Madrid, Spain., Martin-Arevalillo R; Laboratoire de Reproduction et Développement des Plantes, Ecole Normale Superieure de Lyon, 46, allée d'Italie, 69007, Lyon, Auvergne-Rhone-Alpes, France.
Jazyk: angličtina
Zdroj: Statistical applications in genetics and molecular biology [Stat Appl Genet Mol Biol] 2023 Nov 23; Vol. 22 (1). Date of Electronic Publication: 2023 Nov 23 (Print Publication: 2023).
DOI: 10.1515/sagmb-2023-0009
Abstrakt: The ongoing development of high-throughput technologies is allowing the simultaneous monitoring of the expression levels for hundreds or thousands of biological inputs with the proliferation of what has been coined as omic data sources. One relevant issue when analyzing such data sources is concerned with the detection of differential expression across two experimental conditions, clinical status or two classes of a biological outcome. While a great deal of univariate data analysis approaches have been developed to address the issue, strategies for assessing interaction patterns of differential expression are scarce in the literature and have been limited to ad hoc solutions. This paper contributes to the problem by exploiting the facilities of an ensemble learning algorithm like random forests to propose a measure that assesses the differential expression explained by the interaction of the omic variables so subtle biological patterns may be uncovered as a result. The out of bag error rate, which is an estimate of the predictive accuracy of a random forests classifier, is used as a by-product to propose a new measure that assesses interaction patterns of differential expression. Its performance is studied in synthetic scenarios and it is also applied to real studies on SARS-CoV-2 and colon cancer data where it uncovers associations that remain undetected by other methods. Our proposal is aimed at providing a novel approach that may help the experts in biomedical and life sciences to unravel insightful interaction patterns that may decipher the molecular mechanisms underlying biological and clinical outcomes.
(© 2023 Walter de Gruyter GmbH, Berlin/Boston.)
Databáze: MEDLINE