Quantitative sequence basis for the E. coli transcriptional regulatory network

Autor: Sizhe Qiu, Cameron Lamoureux, Amir Akbari, Bernhard O. Palsson, Daniel C. Zielinski
Rok vydání: 2022
Popis: The transcriptional regulatory network (TRN) of E. coli consists of thousands of interactions between regulators and DNA sequences. Inherently the DNA sequence is the primary determinant of the TRN; however, it is well established that the presence of a DNA binding motif does not guarantee a functional regulatory protein binding site. Thus, the extent to which the TRN architecture can be predicted by the genome DNA sequence alone remains unclear. Here, we developed machine learning models that predict the TRN structure of E. coli based on genome sequence. Models were constructed successfully (cross-validation AUROC >= 0.8) for 84% (57/68) of valid E. coli regulons identified from top-down analysis of RNA-seq data. We found that: 1) While regulatory motif strength is the most important sequence feature for determining regulon membership, additional features such as DNA shape substantially influence membership; 2) complex regulons involving multiple interacting regulators could be unraveled by machine learning; 3) investigating regulons where initial ML models failed revealed new regulator-specific sequence features that improved model accuracy. Finally, while regulon structure can appear to be variable across estimation methods and strains, we found that strong regulatory sequence features underlie both the genes that appear most consistently in regulons across estimation methods as well as the core regulon genes in the Fur pan-regulon. This work develops a quantitative understanding of the sequence basis of the TRN and suggests a path towards computationally-guided control of transcriptional regulation for synthetic biology applications.
Databáze: OpenAIRE