Using biological knowledge for multiple sequence aligner decision making

Autor:	Miguel A. Vega-Rodríguez, Mauro Castelli, Álvaro Rubio-Largo, Leonardo Vanneschi
Rok vydání:	2017
Předmět:	0301 basic medicine Information Systems and Management Computer science 02 engineering and technology computer.software_genre Machine learning Theoretical Computer Science Set (abstract data type) 03 medical and health sciences Artificial Intelligence 0202 electrical engineering electronic engineering information engineering Nucleotide Selection (genetic algorithm) chemistry.chemical_classification Sequence Multiple sequence alignment business.industry Computer Science Applications Amino acid 030104 developmental biology chemistry Control and Systems Engineering Benchmark (computing) 020201 artificial intelligence & image processing Artificial intelligence Data mining business computer Software
Zdroj:	Information Sciences. 420:278-298
ISSN:	0020-0255
DOI:	10.1016/j.ins.2017.08.069
Popis:	Multiple Sequence Alignment (MSA) is the simultaneous alignment among three or more biological sequences (nucleotides or amino acids). In recent years, important efforts have been assigned to the development of MSA approaches. In this work, we propose a framework that extracts the biological characteristics of an input set of unaligned sequences and uses this knowledge to decide which is the most suitable aligner and parameter configuration. We refer to it as Multiple Aligner Framework (MAF). The selection of the tuple {Aligner, Configuration} is based on searching, in a pre-computed file, the best tuple for a dataset with similar biological characteristics. In order to create this file, we use multiobjective optimization. In fact, three well-known multiobjective evolutionary algorithms (NSGA-II, IBEA and MOEA/D) have been used. To validate the framework, we have used five popular benchmark suites: BAliBASE 3.0, PREFAB 4.0, SABmark 1.65, OX-Bench and CDD 3.14. After comparing with well-known aligners published in the literature, such as Kalign2, MUSCLE, MAFFT, T-Coffee, MSAProbs, ProbCons, Clustal Ω and MUMMALS, we conclude that the multiple aligner framework is, in average, the method with the best balance between alignment accuracy/conservation and required runtime.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::beacd8419fcf9ff57a37e7dc1228bee1 https://doi.org/10.1016/j.ins.2017.08.069 Zobrazit plný text záznamu Full Text from ScienceDirect