Using biological knowledge for multiple sequence aligner decision making
Autor: | Miguel A. Vega-Rodríguez, Mauro Castelli, Álvaro Rubio-Largo, Leonardo Vanneschi |
---|---|
Rok vydání: | 2017 |
Předmět: |
0301 basic medicine
Information Systems and Management Computer science 02 engineering and technology computer.software_genre Machine learning Theoretical Computer Science Set (abstract data type) 03 medical and health sciences Artificial Intelligence 0202 electrical engineering electronic engineering information engineering Nucleotide Selection (genetic algorithm) chemistry.chemical_classification Sequence Multiple sequence alignment business.industry Computer Science Applications Amino acid 030104 developmental biology chemistry Control and Systems Engineering Benchmark (computing) 020201 artificial intelligence & image processing Artificial intelligence Data mining business computer Software |
Zdroj: | Information Sciences. 420:278-298 |
ISSN: | 0020-0255 |
DOI: | 10.1016/j.ins.2017.08.069 |
Popis: | Multiple Sequence Alignment (MSA) is the simultaneous alignment among three or more biological sequences (nucleotides or amino acids). In recent years, important efforts have been assigned to the development of MSA approaches. In this work, we propose a framework that extracts the biological characteristics of an input set of unaligned sequences and uses this knowledge to decide which is the most suitable aligner and parameter configuration. We refer to it as Multiple Aligner Framework (MAF). The selection of the tuple {Aligner, Configuration} is based on searching, in a pre-computed file, the best tuple for a dataset with similar biological characteristics. In order to create this file, we use multiobjective optimization. In fact, three well-known multiobjective evolutionary algorithms (NSGA-II, IBEA and MOEA/D) have been used. To validate the framework, we have used five popular benchmark suites: BAliBASE 3.0, PREFAB 4.0, SABmark 1.65, OX-Bench and CDD 3.14. After comparing with well-known aligners published in the literature, such as Kalign2, MUSCLE, MAFFT, T-Coffee, MSAProbs, ProbCons, Clustal Ω and MUMMALS, we conclude that the multiple aligner framework is, in average, the method with the best balance between alignment accuracy/conservation and required runtime. |
Databáze: | OpenAIRE |
Externí odkaz: |