Fast and accurate genome-scale identification of DNA-binding sites

Autor: Vincent Maillol, Eric Rivals, David Martin
Přispěvatelé: Méthodes et Algorithmes pour la Bioinformatique (MAB), Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), Institut de Biologie Computationnelle (IBC), Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), ATGC bioinformatics platform., ANR-11-BINF-0002,IBC,Institut de Biologie Computationnelle de Montpellier(2011), ANR-06-MDCA-0014,PlasmoExplore,Fouille des données génomiques de Plasmodium falciparum pour prédire la fonction des gènes orphelins et identifier de nouvelles cibles thérapeutiques(2006), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Université de Montpellier (UM)-Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), ANR-11-BINF-0002,IBC,Institut de biologie Computationnelle(2011)
Jazyk: angličtina
Rok vydání: 2018
Předmět:
0301 basic medicine
Computer science
stringology
binding sites
Computational biology
computer.software_genre
Genome
web
ACM: H.: Information Systems/H.3: INFORMATION STORAGE AND RETRIEVAL/H.3.3: Information Search and Retrieval
03 medical and health sciences
interactive
Pattern matching
Binding site
Transcription factor
genome
transcription factor
Whole genome sequencing
search
software
motif
Search engine indexing
tool
ACM: F.: Theory of Computation/F.2: ANALYSIS OF ALGORITHMS AND PROBLEM COMPLEXITY/F.2.2: Nonnumerical Algorithms and Problems/F.2.2.3: Pattern matching
bioinformatics
DNA binding site
030104 developmental biology
ComputingMethodologies_PATTERNRECOGNITION
pattern matching
efficiency
interface
Web service
[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]
computer
transcriptome
Zdroj: 12th International Conference on Bioinformatics and Biomedicine
BIBM: Bioinformatics and Biomedicine
BIBM: Bioinformatics and Biomedicine, Dec 2018, Madrid, Spain. pp.201-205, ⟨10.1109/BIBM.2018.8621093⟩
BIBM
DOI: 10.1109/BIBM.2018.8621093⟩
Popis: This is the author version of the article published in the conference proceedings. It includes supplementary information. A software called MOTIF is available on the ATGC bioinformatics platform.; International audience; Motivation: Discovering DNA binding sites in genome sequences is crucial for understanding genomic regulation. Currently available computational tools for finding binding sites with Position Weight Matrices of known motifs are often used in restricted genomic regions because of their long run times. The ever-increasing number of complete genome sequences points to the need for new generations of algorithms capable of processing large amounts of data. Results: Here we present MOTIF, a new algorithm for seeking transcription factor binding sites in whole genome sequences in a few seconds. We propose a web service that enables the users to search for their own matrix or for multiple JASPAR matrices. Beyond its efficacy , the service properly handles undetermined positions within the genome sequence and provides an adequate output listing for each position the matching word and its score. Availability: MOTIF is freely available for use through an interface at http://www. atgc-montpellier.fr/motif. The source code of the stand-alone search method of MOTIF is freely available at https://gite.lirmm.fr/rivals/motif.git. It is written in C++ and tested on Linux platforms.
Databáze: OpenAIRE