Popis: |
With the birth of next-generation sequencing (NGS) technology, genomic data in public databases have increased exponentially. Unfortunately, exogenous contamination or intracellular parasite sequences in assemblies could confuse genomic analysis. Meanwhile, they can provide a valuable resource for studies of host-microbe interactions. Here, we used a strategy based on DNA barcodes to scan protistan contamination in the GenBank WGS/TSA database. The results showed a total of 13,952 metazoan/animal assemblies in GenBank, where 17,036 contigs were found to be protistan contaminants in 1507 assemblies (10.8%), with even higher contamination rates in taxa of Cnidaria (150/281), Crustacea (237/480), and Mollusca (107/410). Taxonomic analysis of the protists derived from these contigs showed variations in abundance and evenness of protistan contamination across different metazoan taxa, reflecting host preferences of Apicomplexa, Ciliophora, Oomycota and Symbiodiniaceae for mammals and birds, Crustacea, insects, and Cnidaria, respectively. Finally, mitochondrial proteins COX1 and CYTB were predicted from these contigs, and the phylogenetic analysis corroborated the protistan origination and heterogeneous distribution of the contaminated contigs. Overall, in this study, we conducted a large-scale scan of protistan contaminant in genomic resources, and the protistan sequences detected will help uncover the protist diversity and relationships of these picoeukaryotes with Metazoa. |