Benchmarking of human Y-chromosomal haplogroup classifiers with whole-genome and whole-exome sequence data.

Autor: García-Olivares V; Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain.; Plataforma Genómica de Alto Rendimiento para el Estudio de la Biodiversidad, Instituto de Productos Naturales y Agrobiología (IPNA), Consejo Superior de Investigaciones Científicas, San Cristóbal de La Laguna, Spain., Muñoz-Barrera A; Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain., Rubio-Rodríguez LA; Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain., Jáspez D; Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain., Díaz-de Usera A; Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain., Iñigo-Campos A; Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain., Veeramah KR; Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY 11794-5245, United States., Alonso S; Department of Genetics, Physical Anthropology and Animal Physiology, University of the Basque Country UPV/EHU, Leioa, Bizkaia, Spain.; María Goyri Building, Biotechnology Center, Human Molecular Evolution Lab 2.08 UPV/EHU Science Park, 48940 Leioa, Bizkaia, Spain., Thomas MG; UCL Genetics Institute, University College London (UCL), Gower Street, London WC1E 6BT, United Kingdom.; Research Department of Genetics, Evolution & Environment, University College London (UCL), Darwin Building, Gower Street, London WC1E 6BT, United Kingdom., Lorenzo-Salazar JM; Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain., González-Montelongo R; Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain.; Plataforma Genómica de Alto Rendimiento para el Estudio de la Biodiversidad, Instituto de Productos Naturales y Agrobiología (IPNA), Consejo Superior de Investigaciones Científicas, San Cristóbal de La Laguna, Spain., Flores C; Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain.; Plataforma Genómica de Alto Rendimiento para el Estudio de la Biodiversidad, Instituto de Productos Naturales y Agrobiología (IPNA), Consejo Superior de Investigaciones Científicas, San Cristóbal de La Laguna, Spain.; Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Santa Cruz de Tenerife, Spain.; CIBER de Enfermedades Respiratorias (CIBERES), Instituto de Salud Carlos III, Madrid, Spain.; Facultad de Ciencias de la Salud, Universidad Fernando de Pessoa Canarias, Las Palmas de Gran Canaria, Spain.
Jazyk: angličtina
Zdroj: Computational and structural biotechnology journal [Comput Struct Biotechnol J] 2023 Sep 15; Vol. 21, pp. 4613-4618. Date of Electronic Publication: 2023 Sep 15 (Print Publication: 2023).
DOI: 10.1016/j.csbj.2023.09.012
Abstrakt: In anthropological, medical, and forensic studies, the nonrecombinant region of the human Y chromosome (NRY) enables accurate reconstruction of pedigree relationships and retrieval of ancestral information. Using high-throughput sequencing (HTS) data, we present a benchmarking analysis of command-line tools for NRY haplogroup classification. The evaluation was performed using paired Illumina data from whole-genome sequencing (WGS) and whole-exome sequencing (WES) experiments from 50 unrelated donors. Additionally, as a validation, we also used paired WGS/WES datasets of 54 individuals from the 1000 Genomes Project. Finally, we evaluated the tools on data from third-generation HTS obtained from a subset of donors and one reference sample. Our results show that WES, despite typically offering less genealogical resolution than WGS, is an effective method for determining the NRY haplogroup. Y-LineageTracker and Yleaf showed the highest accuracy for WGS data, classifying precisely 98% and 96% of the samples, respectively. Yleaf outperforms all benchmarked tools in the WES data, classifying approximately 90% of the samples. Yleaf, Y-LineageTracker, and pathPhynder can correctly classify most samples (88%) sequenced with third-generation HTS. As a result, Yleaf provides the best performance for applications that use WGS and WES. Overall, our study offers researchers with a guide that allows them to select the most appropriate tool to analyze the NRY region using both second- and third-generation HTS data.
Competing Interests: The authors declare no conflict of interest.
(© 2023 The Authors.)
Databáze: MEDLINE