Joining Illumina paired-end reads for classifying phylogenetic marker sequences

Autor:	Yung I. Hou, Jiu Yao Wang, Chen Yu Chen, Yi Lin Chen, Min Ching Lin, Tsunglin Liu, An Chen-Deng
Jazyk:	angličtina
Rok vydání:	2020
Předmět:	Genetic Markers 16S Computer science computer.software_genre lcsh:Computer applications to medicine. Medical informatics Biochemistry Marker gene Illumina paired-end 03 medical and health sciences 0302 clinical medicine Taxonomy annotation Structural Biology Cluster Analysis Humans Child Molecular Biology lcsh:QH301-705.5 Phylogeny Illumina dye sequencing 030304 developmental biology Sequence clustering 0303 health sciences Bacteria Phylogenetic tree business.industry Methodology Article Microbiota Applied Mathematics High-Throughput Nucleotide Sequencing Sequence Analysis DNA Asthma Computer Science Applications lcsh:Biology (General) Metagenomics Read joining Metagenome lcsh:R858-859.7 Artificial intelligence DNA microarray Primer (molecular biology) business computer Classifier (UML) 030217 neurology & neurosurgery Natural language processing
Zdroj:	BMC Bioinformatics, Vol 21, Iss 1, Pp 1-13 (2020) BMC Bioinformatics
ISSN:	1471-2105
DOI:	10.1186/s12859-020-3445-6
Popis:	Background Illumina sequencing of a marker gene is popular in metagenomic studies. However, Illumina paired-end (PE) reads sometimes cannot be merged into single reads for subsequent analysis. When mergeable PE reads are limited, one can simply use only first reads for taxonomy annotation, but that wastes information in the second reads. Presumably, including second reads should improve taxonomy annotation. However, a rigorous investigation of how best to do this and how much can be gained has not been reported. Results We evaluated two methods of joining as opposed to merging PE reads into single reads for taxonomy annotation using simulated data with sequencing errors. Our rigorous evaluation involved several top classifiers (RDP classifier, SINTAX, and two alignment-based methods) and realistic benchmark datasets. For most classifiers, read joining ameliorated the impact of sequencing errors and improved the accuracy of taxonomy predictions. For alignment-based top-hit classifiers, rearranging the reference sequences is recommended to avoid improper alignments of joined reads. For word-counting classifiers, joined reads could be compared to the original reference for classification. We also applied read joining to our own real MiSeq PE data of nasal microbiota of asthmatic children. Before joining, trimming low quality bases was necessary for optimizing taxonomy annotation and sequence clustering. We then showed that read joining increased the amount of effective data for taxonomy annotation. Using these joined trimmed reads, we were able to identify two promising bacterial genera that might be associated with asthma exacerbation. Conclusions When mergeable PE reads are limited, joining them into single reads for taxonomy annotation is always recommended. Reference sequences may need to be rearranged accordingly depending on the classifier. Read joining also relaxes the constraint on primer selection, and thus may unleash the full capacity of Illumina PE data for taxonomy annotation. Our work provides guidance for fully utilizing PE data of a marker gene when mergeable reads are limited.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::566bf4f104de61baf7c9f018a6e8d944 http://link.springer.com/article/10.1186/s12859-020-3445-6 Zobrazit plný text záznamu Plný text ve formátu PDF Plný text ve formátu HTML
Nepřihlášeným uživatelům se plný text nezobrazuje	K zobrazení výsledku je třeba se přihlásit.