An in-depth evaluation of annotated transcription start sites in E. coli using deep learning

Autor: Clauwaert, Jim, Waegeman, Willem
Jazyk: angličtina
Rok vydání: 2020
DOI: 10.1101/2020.03.16.993501
Popis: The annotation of transcription start sites with computational methods is an important and unsolved problem in genomics. In recent years, several novel experimental methodologies – named Cappable-seq, SMRT-Cappable-seq and SEnd-seq – have been introduced for the detection of transcription start sites and applied on E. coli . In this study, a comparison is made between these new methodologies and the curated transcription start site data set featured by RegulonDB. The analysis between these data sets is facilitated using deep learning techniques that cover both unsupervised and supervised learning, where we expand upon a framework that allows for interpretable deep learning in genomics. This study finds annotations of recent techniques to surpass the quality of annotations provided by RegulonDB. Analysis of the transformer network trained for the detection of TSS in E. coli reveals its attention scores to pinpoint important promoter regions previously discussed in literature. Additionally, findings support the occurrence of a complex interaction between sense and antisense output probabilities, prevalent on key positions for interference of the transcription process.
Databáze: OpenAIRE