Popis: |
A major challenge in the post-genomic era is the annotation of functionally uncharacterized proteins that emerge from an ever increasing number of sequencing projects. This task is almost always accomplished by the transfer of information from other known proteins. Here, we evaluate the strengths and weaknesses of established and novel bioinformatics approaches to transfer functional annotations from characterized to yet uncharacterized proteins. Starting with the fundamentals of homology inferred via sequence similarity, we expand the concept of functional inference from homology to functional inference from protein domain structure. We introduce the term feature architecture to summarize the entirety of functional domains, secondary structure elements, and compositional properties, and show that feature architecture similarity serves as a good proxy for the degree of functional similarity between two proteins. With FACT, we provide an implementation of a feature architecture based search algorithm. Subsequently, we evaluate the reliability of domain detection and investigate the evolution of protein domains in a simulation framework. We, therefore, introduce REvolver, a simulator implementing biologically meaningful models of protein sequence evolution by taking domain constraints into account. More precisely, REvolver extracts information from a profile Hidden Markov Model (pHMM) of a domain to automatically parameterize position specific substitution models. Guided by the pHMM it also places insertions and deletions preferentially at positions where they have been observed in other domain instances. In our simulation of protein domain evolution, we identified domains that lose their domain characteristics already after few substitutions. Others preserve their characteristics over large evolutionary distances. Interestingly, some domains repeatedly lose and regain their characteristics in the course of simulated evolution. We discuss this phenomenon in greater detail and suggest a maximum likelihood approach to distinguish between domain detection errors and true evolutionary losses and gains. We then propose how to extend our approach from individual domains to the entire protein and investigate over what evolutionary distances we expect orthologs to be detectable. Finally, we apply methods to detect orthologs and functional equivalents in the proteomes of microsporidia and zygomycetes. Thereby, we discuss their proposed monophyly and investigate the evolutionary ancestry of sex determination. This analysis illustrates the versatility and complementarity of ortholog inferences and feature architecture similarity searches in the search for functionally equivalent proteins. |