Ulysses: accurate detection of low-frequency structural variations in large insert-size sequencing libraries
Autor: | Gilles Fischer, Alexandre Gillet-Markowska, Ingrid Lafontaine, Hugues Richard |
---|---|
Přispěvatelé: | Biologie Computationnelle et Quantitative = Laboratory of Computational and Quantitative Biology (LCQB), Université Pierre et Marie Curie - Paris 6 (UPMC)-Institut de Biologie Paris Seine (IBPS), Université Pierre et Marie Curie - Paris 6 (UPMC)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS), Agence Nationale pour la Recherche [2010 BLAN1606], Centre National de la Recherche Scientifique (CNRS), Japan Society for the Promotion of Science (JSPS) fellowship [PE11014], Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS) |
Rok vydání: | 2014 |
Předmět: |
Statistics and Probability
Sequence analysis Computer science [SDV]Life Sciences [q-bio] Genomic Structural Variation Chromosomal translocation Breast Neoplasms Computational biology Bioinformatics Biochemistry Genome Insert (molecular biology) Humans Genomic library Molecular Biology Gene Library Genome Human Mutagenesis Interspersed Repetitive Sequences Sequence Analysis DNA 3. Good health Computer Science Applications Computational Mathematics Mutagenesis Insertional Computational Theory and Mathematics 13. Climate action Human genome Female Software |
Zdroj: | Bioinformatics Bioinformatics, 2015, 31 (6), pp.801-808. ⟨10.1093/bioinformatics/btu730⟩ Bioinformatics, Oxford University Press (OUP), 2015, 31 (6), pp.801-808. ⟨10.1093/bioinformatics/btu730⟩ |
ISSN: | 1367-4811 1367-4803 |
Popis: | Motivation: The detection of structural variations (SVs) in short-range Paired-End (PE) libraries remains challenging because SV breakpoints can involve large dispersed repeated sequences, or carry inherent complexity, hardly resolvable with classical PE sequencing data. In contrast, large insert-size sequencing libraries (Mate-Pair libraries) provide higher physical coverage of the genome and give access to repeat-containing regions. They can thus theoretically overcome previous limitations as they are becoming routinely accessible. Nevertheless, broad insert size distributions and high rates of chimerical sequences are usually associated to this type of libraries, which makes the accurate annotation of SV challenging. Results: Here, we present Ulysses, a tool that achieves drastically higher detection accuracy than existing tools, both on simulated and real mate-pair sequencing datasets from the 1000 Human Genome project. Ulysses achieves high specificity over the complete spectrum of variants by assessing, in a principled manner, the statistical significance of each possible variant (duplications, deletions, translocations, insertions and inversions) against an explicit model for the generation of experimental noise. This statistical model proves particularly useful for the detection of low frequency variants. SV detection performed on a large insert Mate-Pair library from a breast cancer sample revealed a high level of somatic duplications in the tumor and, to a lesser extent, in the blood sample as well. Altogether, these results show that Ulysses is a valuable tool for the characterization of somatic mosaicism in human tissues and in cancer genomes. Availability and implementation: Ulysses is available at http://www.lcqb.upmc.fr/ulysses. Contact: ingrid.lafontaine@upmc.fr or gilles.fischer@upmc.fr Supplementary information: Supplementary data are available at Bioinformatics online. |
Databáze: | OpenAIRE |
Externí odkaz: |