An integrative method to normalize RNA-Seq data
Autor: | Maftah Abderrahman, Klopp Christophe, Forestier Lionel, Cyril Filloux, Petit Daniel, Meersseman Cédric, Rocha Dominique, Philippe Romain |
---|---|
Přispěvatelé: | Unité de Génétique Moléculaire Animale (UGMA), Université de Limoges (UNILIM)-Institut National de la Recherche Agronomique (INRA), Génétique Animale et Biologie Intégrative (GABI), Institut National de la Recherche Agronomique (INRA)-AgroParisTech, Unité de Biométrie et Intelligence Artificielle (UBIA), Institut National de la Recherche Agronomique (INRA), INRA Animal Genetics Department (BovRNA-Seq project), French National Research Agency [ANR-05-GANI-005, ANR-05-GANI-017-01], APIS GENE [01-2005-QualviGenA-02], Limousin Regional Council, Unité de Génétique Moléculaire Animale (UMR GMA), Institut National de la Recherche Agronomique (INRA)-Université de Limoges (UNILIM), Filloux, Cyril, Meerssemann, Cédric |
Jazyk: | angličtina |
Rok vydání: | 2014 |
Předmět: |
Normalization (statistics)
Transcription Genetic [SDV]Life Sciences [q-bio] RNA-sequencing RNA-Seq Computational biology Biology Biochemistry normalisation 03 medical and health sciences arn 0302 clinical medicine Structural Biology séquençage Animals Humans Molecular Biology 030304 developmental biology Genetics 0303 health sciences qrt pcr Sequence Analysis RNA Applied Mathematics Gene Expression Profiling High-Throughput Nucleotide Sequencing qRT-PCR Transcriptome Sequencing Computer Science Applications Gene expression profiling Normalization 030220 oncology & carcinogenesis Calibration RNA Gene expression DNA microarray expression des gènes Research Article |
Zdroj: | BMC Bioinformatics BMC Bioinformatics, BioMed Central, 2014, 15 (1), pp.188. ⟨10.1186/1471-2105-15-188⟩ BMC Bioinformatics 1 (15), 188. (2014) |
ISSN: | 1471-2105 |
DOI: | 10.1186/1471-2105-15-188⟩ |
Popis: | Background Transcriptome sequencing is a powerful tool for measuring gene expression, but as well as some other technologies, various artifacts and biases affect the quantification. In order to correct some of them, several normalization approaches have emerged, differing both in the statistical strategy employed and in the type of corrected biases. However, there is no clear standard normalization method. Results We present a novel methodology to normalize RNA-Seq data, taking into account transcript size, GC content, and sequencing depth, which are the major quantification-related biases. In this study, we found that transcripts shorter than 600 bp have an underestimated expression level, while longer transcripts are even more overestimated that they are long. Second, it was well known that the higher the GC content (>50%), the more the transcripts are underestimated. Third, we demonstrated that the sequencing depth impacts the size bias and proposed a correction allowing the comparison of expression levels among many samples. The efficiency of our approach was then tested by comparing the correlation between normalized RNA-Seq data and qRT-PCR expression measurements. All the steps are automated in a program written in Perl and available on request. Conclusions The methodology presented in this article identifies and corrects different biases that influence RNA-Seq quantification, and provides more accurate estimations of gene expression levels. This method can be applied to compare expression quantifications from many samples, but preferentially from the same tissue. In order to compare samples from different tissue, a calibration using several reference genes will be required. |
Databáze: | OpenAIRE |
Externí odkaz: |