Recovering accuracy methods for scalable consistency library
Autor: | Cedric Notredame, Jordi Lladós, Josep L. Lérida, Fernando Cores, Fernando Guirado |
---|---|
Rok vydání: | 2014 |
Předmět: |
Computer network architectures
Computer science media_common.quotation_subject 0206 medical engineering Llenguatges de programació Arquitectures de xarxes d'ordinadors Programming languages (Electronic computers) 02 engineering and technology computer.software_genre Biologia computacional Large-Scale Alignments Theoretical Computer Science 03 medical and health sciences Consistency (database systems) Bioinformàtica Informàtica Quality (business) Accuracy 030304 developmental biology media_common 0303 health sciences T-Coffee Scalability Eventual consistency Multiple Sequence Alignment Hardware and Architecture Large-scale alignments Multiple sequence alignment Consistency Data mining Scale (map) Global consistency computer 020602 bioinformatics Software Information Systems |
Zdroj: | Repositorio Abierto de la UdL Universitad de Lleida Recercat. Dipósit de la Recerca de Catalunya instname |
ISSN: | 1573-0484 0920-8542 |
DOI: | 10.1007/s11227-014-1362-z |
Popis: | Multiple sequence alignment (MSA) is crucial for high-throughput next generation sequencing applications. Large-scale alignments with thousands of sequences are necessary for these applications. However, the quality of the alignment of current MSA tools decreases sharply when the number of sequences grows to several thousand. This accuracy degradation can be mitigated using global consistency information as in the T-Coffee MSA-Tool, which implements a consistency library. However, consistency-based methods do not scale well because of the computational resources required to calculate and store the consistency information, which grows quadratically. In this paper, we propose an alternative method for building the consistency-library. To allow unlimited scalability, consistency information must be discarded to avoid exceeding the environment memory. Our first approach deals with the memory limitation by identifying the most important entries, which provide better consistency. This method is able to achieve scalability, although there is a negative impact on accuracy. The second proposal, aims to reduce this degradation of accuracy, with three different methods presented to attain a better alignment. This work has been supported by the Government of Spain TIN2011-28689-C02-02. Cedric Notredame is funded by the Plan Nacional BFU2011-28575 and The Quantomics project (KBBE- 2A-222664). |
Databáze: | OpenAIRE |
Externí odkaz: |