Inferring Phenotypic Trait Evolution on Large Trees With Many Incomplete Measurements

Autor: Marc A. Suchard, Philippe Lemey, Max R. Tolkoff, Gabriel W. Hassler, Lam Si Tung Ho, William L. Allen
Rok vydání: 2020
Předmět:
FOS: Computer and information sciences
Matrix-normal
LIFE-HISTORY VARIATION
Bayesian inference
01 natural sciences
010104 statistics & probability
2.5 Research design and methodologies (aetiology)
Aetiology
MAXIMUM-LIKELIHOOD
TEMPERATURE
Computation (stat.CO)
050205 econometrics
HERITABILITY
05 social sciences
Statistics
FAST-SLOW CONTINUUM
1.4 Methodologies and measurements
Phylogenetics
stat.ME
Physical Sciences
Matrix normal distribution
Statistics
Probability and Uncertainty

Statistics and Probability
Missing data
Statistics & Probability
MODELS
Bioengineering
Biology
Statistics - Computation
Article
CONJUGATE ANALYSIS
Methodology (stat.ME)
Underpinning research
0502 economics and business
ALGORITHM
Econometrics
0101 mathematics
Statistics - Methodology
Demography
stat.CO
Science & Technology
Phenotypic trait
DNA
Taxon
SIZE
Evolutionary biology
Generic health relevance
Mathematics
Zdroj: Journal of the American Statistical Association, vol 117, iss 538
J Am Stat Assoc
DOI: 10.6084/m9.figshare.12851292
Popis: Comparative biologists are often interested in inferring covariation between multiple biological traits sampled across numerous related taxa. To properly study these relationships, we must control for the shared evolutionary history of the taxa to avoid spurious inference. Existing control techniques almost universally scale poorly as the number of taxa increases. An additional challenge arises as obtaining a full suite of measurements becomes increasingly difficult with increasing taxa. This typically necessitates data imputation or integration that further exacerbates scalability. We propose an inference technique that integrates out missing measurements analytically and scales linearly with the number of taxa by using a post-order traversal algorithm under a multivariate Brownian diffusion (MBD) model to characterize trait evolution. We further exploit this technique to extend the MBD model to account for sampling error or non-heritable residual variance. We test these methods to examine mammalian life history traits, prokaryotic genomic and phenotypic traits, and HIV infection traits. We find computational efficiency increases that top two orders-of-magnitude over current best practices. While we focus on the utility of this algorithm in phylogenetic comparative methods, our approach generalizes to solve long-standing challenges in computing the likelihood for matrix-normal and multivariate normal distributions with missing data at scale.
Comment: 29 pages, 7 figures, 2 tables, 3 supplementary sections
Databáze: OpenAIRE