MC64-ClustalWP2: a highly-parallel hybrid strategy to align multiple sequences in many-core architectures

Autor: Sergio Gálvez, Francisco José Esteban, Pilar Hernández, Antonio Guevara, Gabriel Dorado, Juan Antonio Caballero, David Díaz
Přispěvatelé: Ministerio de Economía y Competitividad (España), Junta de Andalucía, CSIC - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA)
Jazyk: angličtina
Rok vydání: 2014
Předmět:
Zdroj: PLoS ONE, Vol 9, Iss 4, p e94044 (2014)
Digital.CSIC. Repositorio Institucional del CSIC
instname
PLoS ONE 9(4): e94044 (2014)
Helvia. Repositorio Institucional de la Universidad de Córdoba
PLoS ONE
ISSN: 1932-6203
Popis: We have developed the MC64-ClustalWP2 as a new implementation of the Clustal W algorithm, integrating a novel parallelization strategy and significantly increasing the performance when aligning long sequences in architectures with many cores. It must be stressed that in such a process, the detailed analysis of both the software and hardware features and peculiarities is of paramount importance to reveal key points to exploit and optimize the full potential of parallelism in many-core CPU systems. The new parallelization approach has focused into the most time-consuming stages of this algorithm. In particular, the so-called progressive alignment has drastically improved the performance, due to a fine-grained approach where the forward and backward loops were unrolled and parallelized. Another key approach has been the implementation of the new algorithm in a hybrid-computing system, integrating both an Intel Xeon multi-core CPU and a Tilera Tile64 many-core card. A comparison with other Clustal W implementations reveals the high-performance of the new algorithm and strategy in many-core CPU architectures, in a scenario where the sequences to align are relatively long (more than 10 kb) and, hence, a many-core GPU hardware cannot be used. Thus, the MC64-ClustalWP2 runs multiple alignments more than 18x than the original Clustal W algorithm, and more than 7x than the best x86 parallel implementation to date, being publicly available through a web service. Besides, these developments have been deployed in cost-effective personal computers and should be useful for life-science researchers, including the identification of identities and differences for mutation/polymorphism analyses, biodiversity and evolutionary studies and for the development of molecular markers for paternity testing, germplasm management and protection, to assist breeding, illegal traffic control, fraud prevention and for the protection of the intellectual property (identification/ traceability), including the protected designation of origin, among other applications. © 2014 Díaz et al.
This work was supported by “Ministerio de Economía y Competitividad” (MINECO grants AGL2010-17316 and BIO2011-15237-E) and “Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria” (MINECO and INIA RF2012-00002-C02-02); “Consejería de Agricultura y Pesca” (041/C/2007, 75/C/2009 and 56/C/2010) and “Consejería de Economía, Innovación y Ciencia” (AGR-7322 and AGR-482) of “Junta de Andalucía”; “Grupo PAI” (AGR-248); and “Universidad de Córdoba” (“Ayuda a Grupos”), Spain.
Databáze: OpenAIRE