Global multivariate model learning from hierarchically correlated data
Autor: | Pierre Barrat-Charlaix, Edwin Rodriguez Horta, Martin Weigt, Alejandro Lage-Castellanos |
---|---|
Přispěvatelé: | University of Havana (Universidad de la Habana) (UH), Statistical Genomics and Biological Physics [LCQB] (LCQB-SGBP), Biologie Computationnelle et Quantitative = Laboratory of Computational and Quantitative Biology (LCQB), Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Institut de Biologie Paris Seine (IBPS), Institut National de la Santé et de la Recherche Médicale (INSERM)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Institut de Biologie Paris Seine (IBPS), Institut National de la Santé et de la Recherche Médicale (INSERM)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS), Biozentrum [Basel, Suisse], University of Basel (Unibas), Weigt, Martin |
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
Statistics and Probability
[PHYS]Physics [physics] 0303 health sciences Multivariate statistics Statistical Mechanics (cond-mat.stat-mech) Computer science [SDV]Life Sciences [q-bio] FOS: Physical sciences Statistical and Nonlinear Physics Disordered Systems and Neural Networks (cond-mat.dis-nn) Condensed Matter - Disordered Systems and Neural Networks Quantitative Biology - Quantitative Methods 01 natural sciences [PHYS] Physics [physics] [SDV] Life Sciences [q-bio] 03 medical and health sciences FOS: Biological sciences 0103 physical sciences Statistics Statistics Probability and Uncertainty 010306 general physics Condensed Matter - Statistical Mechanics Quantitative Methods (q-bio.QM) 030304 developmental biology |
Zdroj: | Journal of Statistical Mechanics: Theory and Experiment Journal of Statistical Mechanics: Theory and Experiment, 2021, 2021 (7), pp.073501. ⟨10.1088/1742-5468/ac06c2⟩ |
ISSN: | 1742-5468 |
Popis: | Inverse statistical physics aims at inferring models compatible with a set of empirical averages estimated from a high-dimensional dataset of independently distributed equilibrium configurations of a given system. However, in several applications such as biology, data result from stochastic evolutionary processes, and configurations are related through a hierarchical structure, typically represented by a tree, and therefore not independent. In turn, empirical averages of observables superpose intrinsic signal related to the equilibrium distribution of the studied system and spurious historical (or phylogenetic) signal resulting from the structure underlying the data-generating process. The naive application of inverse statistical physics techniques therefore leads to systematic biases and an effective reduction of the sample size. To advance on the currently open task of extracting intrinsic signals from correlated data, we study a system described by a multivariate Ornstein-Uhlenbeck process defined on a finite tree. Using a Bayesian framework, we can disentangle covariances in the data corresponding to their multivariate Gaussian equilibrium distribution from those resulting from the historical correlations. Our approach leads to a clear gain in accuracy in the inferred equilibrium distribution, which corresponds to an effective two- to fourfold increase in sample size. Comment: 34 pages 10 figures |
Databáze: | OpenAIRE |
Externí odkaz: |