A multi-view genomic data simulator
Autor: | Dario Greco, Angela Serra, Giancarlo Raiconi, Roberto Tagliaferri, Michele Fratello, Vittorio Fortino |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2015 |
Předmět: |
DNA Copy Number Variations
Gene regulatory network Datasets as Topic Feature selection Genomics Computational biology Multi-view Regulatory network Gene-miRNA interactions OMICs data simulation Biology Biochemistry Set (abstract data type) 03 medical and health sciences 0302 clinical medicine Structural Biology Humans Computer Simulation Gene Regulatory Networks Copy-number variation Molecular Biology 030304 developmental biology 0303 health sciences Gene Expression Profiling Applied Mathematics Computational Biology DNA Methylation Computer Science Applications MicroRNAs ComputingMethodologies_PATTERNRECOGNITION Gene Expression Regulation A priori and a posteriori DNA microarray Algorithms 030217 neurology & neurosurgery Research Article |
Zdroj: | BMC Bioinformatics |
Popis: | Background OMICs technologies allow to assay the state of a large number of different features (e.g., mRNA expression, miRNA expression, copy number variation, DNA methylation, etc.) from the same samples. The objective of these experiments is usually to find a reduced set of significant features, which can be used to differentiate the conditions assayed. In terms of development of novel feature selection computational methods, this task is challenging for the lack of fully annotated biological datasets to be used for benchmarking. A possible way to tackle this problem is generating appropriate synthetic datasets, whose composition and behaviour are fully controlled and known a priori. Results Here we propose a novel method centred on the generation of networks of interactions among different biological molecules, especially involved in regulating gene expression. Synthetic datasets are obtained from ordinary differential equations based models with known parameters. Our results show that the generated datasets are well mimicking the behaviour of real data, for popular data analysis methods are able to selectively identify existing interactions. Conclusions The proposed method can be used in conjunction to real biological datasets in the assessment of data mining techniques. The main strength of this method consists in the full control on the simulated data while retaining coherence with the real biological processes. The R package MVBioDataSim is freely available to the scientific community at http://neuronelab.unisa.it/?p=1722. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0577-1) contains supplementary material, which is available to authorized users. |
Databáze: | OpenAIRE |
Externí odkaz: |