Testing for association with rare variants in the coding and non-coding genome: RAVA-FIRST, a new approach based on CADD deleteriousness score
Autor: | Jean-François Deleuze, Thomas Ludwig, Emmanuelle Génin, Hervé Perdry, Ozvan Bocher, Pierre-Emmanuel Morange, Suryakant Suryakant, David-Alexandre Trégouët, Gaëlle Marenne, Jacob Odeberg |
---|---|
Přispěvatelé: | Admin, Oskar, Medical Genomics - - GENMED2010 - ANR-10-LABX-0013 - LABX - VALID, Génétique, génomique fonctionnelle et biotechnologies (UMR 1078) (GGB), EFS-Université de Brest (UBO)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Institut Brestois Santé Agro Matière (IBSAM), Université de Brest (UBO), Helmholtz Zentrum München = German Research Center for Environmental Health, Centre Hospitalier Régional Universitaire de Brest (CHRU Brest), Centre National de Recherche en Génomique Humaine (CNRGH), Commissariat à l'énergie atomique et aux énergies alternatives (CEA), Institut de Biologie François JACOB (JACOB), Direction de Recherche Fondamentale (CEA) (DRF (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA), Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Royal Institute of Technology [Stockholm] (KTH ), University of Tromsø (UiT), Centre recherche en CardioVasculaire et Nutrition = Center for CardioVascular and Nutrition research (C2VN), Aix Marseille Université (AMU)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Centre de recherche en épidémiologie et santé des populations (CESP), Université de Versailles Saint-Quentin-en-Yvelines (UVSQ)-Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-Hôpital Paul Brousse-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Paris-Saclay, Agence Nationale de la Recherche, ANR-10-LABX-0013,GENMED,Medical Genomics(2010) |
Rok vydání: | 2022 |
Předmět: |
Cancer Research
Computer science Association (object-oriented programming) Genetic Variation High-Throughput Nucleotide Sequencing Computational biology Genomics computer.software_genre Genome Annotation [SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie Genetics Computer Aided Design Humans [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie DNA Intergenic Exome computer Venous thromboembolism Molecular Biology Genetics (clinical) Ecology Evolution Behavior and Systematics Coding (social sciences) Sequence (medicine) |
Zdroj: | PLoS Genetics PLoS Genetics, Public Library of Science, 2022, 18 (9), ⟨10.1371/journal.pgen.1009923⟩ PLoS Genetics, 2022, 18 (9), ⟨10.1371/journal.pgen.1009923⟩ PLoS Genet. 18:e1009923 (2022) |
ISSN: | 1553-7404 1553-7390 |
Popis: | Rare variant association tests (RVAT) have been developed to study the contribution of rare variants widely accessible through high-throughput sequencing technologies. RVAT require to aggregate rare variants in testing units and to filter variants to retain only the most likely causal ones. In the exome, genes are natural testing units and variants are usually filtered based on their functional consequences. However, when dealing with whole-genome sequence (WGS) data, both steps are challenging. No natural biological unit is available for aggregating rare variants. Sliding windows procedures have been proposed to circumvent this difficulty, however they are blind to biological information and result in a large number of tests.We propose a new strategy to perform RVAT on WGS data: “RAVA-FIRST” (RAre Variant Association using Functionally-InfoRmed STeps) comprising three steps. (1) New testing units are defined genome-wide based on functionally-adjusted Combined Annotation Dependent Depletion (CADD) scores of variants observed in the GnomAD populations, which are referred to as “CADD regions”. (2) A region-dependent filtering of rare variants is applied in each CADD region. (3) A functionally-informed burden test is performed with sub-scores computed for each genomic category within each CADD region. Both on simulations and real data, RAVA-FIRST was found to outperform other WGS-based RVAT. Applied to a WGS dataset of venous thromboembolism patients, we identified an intergenic region on chromosome 18 that is enriched for rare variants in early-onset patients and that was that was missed by standard sliding windows procedures.RAVA-FIRST enables new investigations of rare non-coding variants in complex diseases, facilitated by its implementation in the R package Ravages.Author SummaryTechnological progresses have made possible whole genome sequencing at an unprecedented scale, opening up the possibility to explore the role of genetic variants of low frequency in common diseases. The challenge is now methodological and requires the development of novel methods and strategies to analyse sequencing data that are not limited to assessing the role of coding variants. With RAVA-FIRST, we propose a novel strategy to investigate the role of rare variants in the whole-genome that takes benefit from biological information. Especially, RAVA-FIRST relies on testing units that go beyond genes to gather rare variants in the association tests. In this work, we show that this new strategy presents several advantages compared to existing methods. RAVA-FIRST offers an easy and straightforward analysis of genome-wide rare variants, especially the intergenic ones which are frequently left behind, making it a promising tool to get a better understanding of the biology of complex diseases. |
Databáze: | OpenAIRE |
Externí odkaz: |