Popis: |
MotivationThe major algorithms for quantifying transcriptomics data for differential gene expression analysis were designed for analyzing data from human or human-like genomes, specifically those with single gene transcripts and distinct transcriptional boundaries that extend beyond the coding sequence (CDS) as identified through expressed sequence tags (ESTs) or EST-like sequence data. Some eukaryotic genomes and all, or nearly all, bacterial genomes require alternate methods of quantification since they lack annotation of transcriptional boundaries with EST or EST-like data, have overlapping transcriptional boundaries, and/or have polycistronic transcripts.ResultsAn algorithm was developed and tested that better quantifies transcriptomics data for differential gene expression analysis in organisms with overlapping transcriptional units and polycistronic transcripts. Using data from standard libraries originating from Escherichia coli and Ehrlichia chaffeensis, and strand-specific libraries from the Wolbachia endosymbiont wBm, FADU can derive counts for genes that are missed by HTSeq and featurecounts. Using the default parameters with the E. coli data, FADU can detect transcription of 51 more genes than HTSeq in union mode and 21 genes more than featurecounts, with 42 and 18 of these features being Availability and implementationFADU is available at https://github.com/adkinsrs/FADU. FADU was implemented using Python3 and requires the PySAM module (version 0.12.0.1 or later).Contactjdhotopp@som.umaryland.edu |