Inferring epidemiological links from deep sequencing data: a statistical learning approach for human, animal and plant diseases

Autor: Gaël Thébaud, Samuel Soubeyrand, M. Alamil, Karine Berthier, Cécile Desbiez, Joseph Hughes
Přispěvatelé: Biostatistique et Processus Spatiaux (BioSP), Institut National de la Recherche Agronomique (INRA), Medical Research Council, Unité de Pathologie Végétale (PV), Biologie et Génétique des Interactions Plante-Parasite (UMR BGPI), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Institut National de la Recherche Agronomique (INRA)-Institut national d’études supérieures agronomiques de Montpellier (Montpellier SupAgro), Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Centre international d'études supérieures en sciences agronomiques (Montpellier SupAgro), ANR grant (SMITID project, ANR-16-CE35-0006), Medical Research Council (MC_UU_12014/12), Division for Plant Health and Environment (SPE) of INRA through the AAP-SPE-2014 framework, Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Institut National de la Recherche Agronomique (INRA)-Centre international d'études supérieures en sciences agronomiques (Montpellier SupAgro)-Institut national d’études supérieures agronomiques de Montpellier (Montpellier SupAgro), Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro), Soubeyrand, Samuel
Jazyk: angličtina
Rok vydání: 2019
Předmět:
[SDV.SA]Life Sciences [q-bio]/Agricultural sciences
Human animal
Computer science
Animal Diseases
0302 clinical medicine
Databases
Genetic

Epidemiology
pathogen spread
épidémiologie végétale
pathologie végétale
0303 health sciences
Vegetal Biology
Training set
High-Throughput Nucleotide Sequencing
Articles
Agricultural sciences
contact information
Viruses
analyse de séquences
General Agricultural and Biological Sciences
Research Article
medicine.medical_specialty
Exploit
infectious disease
training data
transmission trees
within-host pathogen diversity
Communicable Diseases
General Biochemistry
Genetics and Molecular Biology

Deep sequencing
03 medical and health sciences
medicine
Animals
Humans
[SDV.BV]Life Sciences [q-bio]/Vegetal Biology
pathologie animale
modélisation
Plant Diseases
030304 developmental biology
Models
Statistical

Statistical learning
Outbreak
Molecular Sequence Annotation
Data science
Infectious disease (medical specialty)
pathologie humaine
Sciences agricoles
Biologie végétale
030217 neurology & neurosurgery
Zdroj: Philosophical Transactions of the Royal Society B: Biological Sciences
Philosophical Transactions of the Royal Society B: Biological Sciences, Royal Society, The, 2019, 374 (1775), ⟨10.1098/rstb.2018.0258⟩
Philosophical Transactions of the Royal Society. B, Biological Sciences 1775 (374), 20180258. (2019)
ISSN: 0962-8436
1471-2970
DOI: 10.1098/rstb.2018.0258⟩
Popis: Pathogen sequence data have been exploited to infer who infected whom, by using empirical and model-based approaches. Most of these approaches exploit one pathogen sequence per infected host (e.g. individual, household, field). However, modern sequencing techniques can reveal the polymorphic nature of within-host populations of pathogens. Thus, these techniques provide a subsample of the pathogen variants that were present in the host at the sampling time. Such data are expected to give more insight on epidemiological links than a single sequence per host. In general, a mechanistic viewpoint to transmission and micro-evolution has been followed to infer epidemiological links from these data. Here, we investigate an alternative approach grounded on statistical learning. The idea consists of learning the structure of epidemiological links with a pseudo-evolutionary model applied to training data obtained from contact tracing, for example, and using this initial stage to infer links for the whole dataset. Such an approach has the potential to be particularly valuable in the case of a risk of erroneous mechanistic assumptions, it is sufficiently parsimonious to allow the handling of big datasets in the future, and it is versatile enough to be applied to very different contexts from animal, human and plant epidemiology. This article is part of the theme issue ‘Modelling infectious disease outbreaks in humans, animals and plants: approaches and important themes’. This issue is linked with the subsequent theme issue ‘Modelling infectious disease outbreaks in humans, animals and plants: epidemic forecasting and control’.
Databáze: OpenAIRE