Insufficiently complex unique-molecular identifiers (UMIs) distort small RNA sequencing

Autor:	Katherine A. Pillman, Cameron P. Bracken, John Toubia, Klay Saunders, Gregory J. Goodall, Andrew G. Bert, Philip A. Gregory, B. Kate Dredge
Přispěvatelé:	Saunders, Klay, Bert, Andrew G, Dredge, B Kate, Toubia, John, Gregory, Philip A, Pillman, Katherine A, Goodall, Gregory J, Bracken, Cameron P
Rok vydání:	2020
Předmět:	0301 basic medicine resolution cell Small RNA small cytoplasmic RNA lcsh:Medicine Computational biology Biology Genome Article law.invention Transcriptome 03 medical and health sciences 0302 clinical medicine law Gene expression single-cell analysis Humans lcsh:Science Polymerase chain reaction Multidisciplinary Sequence Analysis RNA lcsh:R Biological techniques RNA Epithelial Cells Mesenchymal Stem Cells Sequence Analysis DNA Computational biology and bioinformatics Identifier MicroRNAs Identification (information) 030104 developmental biology lcsh:Q Algorithms 030217 neurology & neurosurgery
Zdroj:	Scientific Reports Scientific Reports, Vol 10, Iss 1, Pp 1-9 (2020)
ISSN:	2045-2322
DOI:	10.1038/s41598-020-71323-0
Popis:	The attachment of unique molecular identifiers (UMIs) to RNA molecules prior to PCR amplification and sequencing, makes it possible to amplify libraries to a level that is sufficient to identify rare molecules, whilst simultaneously eliminating PCR bias through the identification of duplicated reads. Accurate de-duplication is dependent upon a sufficiently complex pool of UMIs to allow unique labelling. In applications dealing with complex libraries, such as total RNA-seq, only a limited variety of UMIs are required as the variation in molecules to be sequenced is enormous. However, when sequencing a less complex library, such as small RNAs for which there is a more limited range of possible sequences, we find increased variation in UMIs are required, even beyond that provided in a commercial kit specifically designed for the preparation of small RNA libraries for sequencing. We show that a pool of UMIs randomly varying across eight nucleotides is not of sufficient depth to uniquely tag the microRNAs to be sequenced. This results in over de-duplication of reads and the marked under-estimation of expression of the more abundant microRNAs. Whilst still arguing for the utility of UMIs, this work demonstrates the importance of their considered design to avoid errors in the estimation of gene expression in libraries derived from select regions of the transcriptome or small genomes.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::032b107c1a2d9336cb2468f472bd5a20 https://doi.org/10.1038/s41598-020-71323-0 Zobrazit plný text záznamu