Popis: |
Since the initial sequencing of the human genome in 2001, the field of systems biology has strived to understand gene regulatory mechanisms, in particular the role of transcription factor binding. This is crucial to inform our understanding of diseases such as cancer: regulatory mechanisms are the primary alterations observed in tumors, and they are extensively investigated for treatment and drug design, as well as early detection. Despite the importance of discovering the underlying regulatory mechanisms, the task of inferring them from data while maintaining accuracy, explainability, scalability, and flexibility remains a significant open challenge. This thesis proposes GIRAFFE, a scalable matrix factorization-based algorithm to jointly infer regulatory effects and transcription factor activities from gene expression data. GIRAFFE integrates prior knowledge about regulation to guide the optimization, yielding an interpretable model where regulatory weights are partial effects. Moreover, it can be customized to the requirements of the downstream application by adjusting for variables of interest, such as confounders, and adding sparsity constraints, which help to interpret the regulatory network. We demonstrate the effectiveness of this approach with extensive experiments on synthetic, as well as real world data. Our algorithm outperforms state-of-the-art gene regulatory network inference methods in predicting interactions between transcription factors and target genes. Moreover, it is able to distinguish between activating and inhibitory effects, yielding plausible results in downstream applications such as gene set enrichment analysis. |