Objective sequence-based subfamily classifications of mouse homeodomains reflect their in vitro DNA-binding preferences

Autor:	Gwenael Badis, Timothy P. Hughes, Jennifer Tsai, Shoshana J. Wodak, Martha L. Bulyk, Michael F. Berger, Miguel A. Santos, Andrew R. Gehrke, Andrei L. Turinsky, Shaheynoor Talukder, Serene Ong
Přispěvatelé:	Harvard University--MIT Division of Health Sciences and Technology, Bulyk, Martha L., Banting and Best Department of Medical Research, University of Toronto
Jazyk:	angličtina
Rok vydání:	2010
Předmět:	Subfamily [SDV]Life Sciences [q-bio] Computational biology Biology DNA sequencing 03 medical and health sciences chemistry.chemical_compound Mice 0302 clinical medicine Sequence Analysis Protein Genetics Animals ComputingMilieux_MISCELLANEOUS 030304 developmental biology Homeodomain Proteins 0303 health sciences Robustness (evolution) Experimental data Computational Biology DNA In vitro 3. Good health chemistry Homeobox DNA microarray 030217 neurology & neurosurgery
Zdroj:	Oxford Nucleic Acids Research Nucleic Acids Research, Oxford University Press, 2010, 38 (22), pp.7927-7942. ⟨10.1093/nar/gkq714⟩
ISSN:	0305-1048 1362-4962
DOI:	10.1093/nar/gkq714⟩
Popis:	Classifying proteins into subgroups with similar molecular function on the basis of sequence is an important step in deriving reliable functional annotations computationally. So far, however, available classification procedures have been evaluated against protein subgroups that are defined by experts using mainly qualitative descriptions of molecular function. Recently, in vitro DNA-binding preferences to all possible 8-nt DNA sequences have been measured for 178 mouse homeodomains using protein-binding microarrays, offering the unprecedented opportunity of evaluating the classification methods against quantitative measures of molecular function. To this end, we automatically derive homeodomain subtypes from the DNA-binding data and independently group the same domains using sequence information alone. We test five sequence-based methods, which use different sequence-similarity measures and algorithms to group sequences. Results show that methods that optimize the classification robustness reflect well the detailed functional specificity revealed by the experimental data. In some of these classifications, 73–83% of the subfamilies exactly correspond to, or are completely contained in, the function-based subtypes. Our findings demonstrate that certain sequence-based classifications are capable of yielding very specific molecular function annotations. The availability of quantitative descriptions of molecular function, such as DNA-binding data, will be a key factor in exploiting this potential in the future. Canadian Institutes of Health Research (MOP#82940) Sickkids Foundation Ontario Research Fund National Science Foundation (U.S.) National Human Genome Research Institute (U.S.) (R01 HG003985)
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::232258ec8d53868f68f257a2b40515de http://hdl.handle.net/1721.1/70985 Zobrazit plný text záznamu