MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs

Autor:	Jarkko Toivonen, Esko Ukkonen, Jussi Taipale, Pratyush Kumar Das
Přispěvatelé:	Department of Computer Science, University of Helsinki, ATG - Applied Tumor Genomics, Research Programs Unit, Jussi Taipale / Principal Investigator
Rok vydání:	2019
Předmět:	Statistics and Probability Orientation (graph theory) Markov model SEQUENCE Biochemistry 03 medical and health sciences chemistry.chemical_compound 0302 clinical medicine EM ALGORITHM Position (vector) Expectation–maximization algorithm Order (group theory) Position-Specific Scoring Matrices TRANSCRIPTION FACTOR POSITION Nucleotide Motifs SPECIFICITY Molecular Biology 030304 developmental biology Mathematics 11832 Microbiology and virology SITES 0303 health sciences Sequence Binding Sites IDENTIFICATION RECOGNITION PROTEIN-DNA INTERACTIONS 113 Computer and information sciences Mixture model Original Papers Computer Science Applications Computational Mathematics Monomer Computational Theory and Mathematics chemistry 1182 Biochemistry cell and molecular biology Biological system Sequence Analysis 030217 neurology & neurosurgery Algorithms Software Protein Binding Transcription Factors
Zdroj:	Bioinformatics
ISSN:	1367-4811
Popis:	Motivation Position-specific probability matrices (PPMs, also called position-specific weight matrices) have been the dominating model for transcription factor (TF)-binding motifs in DNA. There is, however, increasing recent evidence of better performance of higher order models such as Markov models of order one, also called adjacent dinucleotide matrices (ADMs). ADMs can model dependencies between adjacent nucleotides, unlike PPMs. A modeling technique and software tool that would estimate such models simultaneously both for monomers and their dimers have been missing. Results We present an ADM-based mixture model for monomeric and dimeric TF-binding motifs and an expectation maximization algorithm MODER2 for learning such models from training data and seeds. The model is a mixture that includes monomers and dimers, built from the monomers, with a description of the dimeric structure (spacing, orientation). The technique is modular, meaning that the co-operative effect of dimerization is made explicit by evaluating the difference between expected and observed models. The model is validated using HT-SELEX and generated datasets, and by comparing to some earlier PPM and ADM techniques. The ADM models explain data slightly better than PPM models for 314 tested TFs (or their DNA-binding domains) from four families (bHLH, bZIP, ETS and Homeodomain), the ADM mixture models by MODER2 being the best on average. Availability and implementation Software implementation is available from https://github.com/jttoivon/moder2. Supplementary information Supplementary data are available at Bioinformatics online.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::4fa5c60b115791df20ecab64d67f3986 https://pubmed.ncbi.nlm.nih.gov/31999322 Zobrazit plný text záznamu