Multiple Cases of Bacterial Sequence Erroneously Incorporated Into Publicly Available Chloroplast Genomes.

Autor: Robinson AJ; Los Alamos National Laboratory, Biosecurity and Public Health Group, Bioscience Division, Los Alamos, NM, United States., Daligault HE; Los Alamos National Laboratory, Biosecurity and Public Health Group, Bioscience Division, Los Alamos, NM, United States., Kelliher JM; Los Alamos National Laboratory, Biosecurity and Public Health Group, Bioscience Division, Los Alamos, NM, United States., LeBrun ES; Los Alamos National Laboratory, Biosecurity and Public Health Group, Bioscience Division, Los Alamos, NM, United States., Chain PSG; Los Alamos National Laboratory, Biosecurity and Public Health Group, Bioscience Division, Los Alamos, NM, United States.
Jazyk: angličtina
Zdroj: Frontiers in genetics [Front Genet] 2022 Jan 13; Vol. 12, pp. 821715. Date of Electronic Publication: 2022 Jan 13 (Print Publication: 2021).
DOI: 10.3389/fgene.2021.821715
Abstrakt: Public sequencing databases are invaluable resources to biological researchers, but assessing data veracity as well as the curation and maintenance of such large collections of data can be challenging. Genomes of eukaryotic organelles, such as chloroplasts and other plastids, are particularly susceptible to assembly errors and misrepresentations in these databases due to their close evolutionary relationships with bacteria, which may co-occur within the same environment, as can be the case when sequencing plants. Here, based on sequence similarities with bacterial genomes, we identified several suspicious chloroplast assemblies present in the National Institutes of Health (NIH) Reference Sequence (RefSeq) collection. Investigations into these chloroplast assemblies reveal examples of erroneous integration of bacterial sequences into chloroplast ribosomal RNA (rRNA) loci, often within the rRNA genes, presumably due to the high similarity between plastid and bacterial rRNAs. The bacterial lineages identified within the examined chloroplasts as the most likely source of contamination are either known associates of plants, or co-occur in the same environmental niches as the examined plants. Modifications to the methods used to process untargeted 'raw' shotgun sequencing data from whole genome sequencing efforts, such as the identification and removal of bacterial reads prior to plastome assembly, could eliminate similar errors in the future.
Competing Interests: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
(Copyright © 2022 Robinson, Daligault, Kelliher, LeBrun and Chain.)
Databáze: MEDLINE