Principled computational methods for the validation discovery of genetic regulatory networks

Autor: Hartemink, Alexander J. (Alexander John), 1972
Jazyk: angličtina
Rok vydání: 2001
Předmět:
Druh dokumentu: Diplomová práce
Popis: Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.
Includes bibliographical references (p. 193-206).
As molecular biology continues to evolve in the direction of high-throughput collection of data, it has become increasingly necessary to develop computational methods for analyzing observed data that are at once both sophisticated enough to capture essential features of biological phenomena and at the same time approachable in terms of their application. We demonstrate how graphical models, and Bayesian networks in particular, can be used to model genetic regulatory networks. These methods are well-suited to this problem owing to their ability to model more than pair-wise relationships between variables, their ability to guard against over-fitting, and their robustness in the face of noisy data. Moreover, Bayesian network models can be scored in a principled manner in the presence of both genomic expression and location data. We develop methods for extending Bayesian network semantics to include edge annotations that allow us to model statistical dependencies between biological factors with greater refinement. We derive principled methods for scoring these annotated Bayesian networks. Using these models in the presence of genomic expression data requires suitable methods for the normalization and discretization of this data.
(cont.) We present novel methods appropriate to this context for performing each of these operations. With these elements in place, we are able to apply our scoring framework to both validate models of regulatory networks in comparison with one another and discover networks using heuristic search methods. To demonstrate the utility of this framework for the elucidation of genetic regulatory networks, we apply these methods in the context of the well-understood galactose regulatory system and the less well-understood pheromone response system in yeast. We demonstrate how genomic expression and location data can be combined in a principled manner to enable the induction of models not readily discovered if the data sources are considered in isolation.
by Alexander John Hartemink.
Ph.D.
Databáze: Networked Digital Library of Theses & Dissertations