Protein Construction-Based Data Partitioning Scheme for Alignment of Protein Macromolecular Structures Through Distributed Querying in Federated Databases
Autor: | Bożena Małysiak-Mrozek, Dariusz Mrozek, Jacek Kwiendacz |
---|---|
Rok vydání: | 2019 |
Předmět: |
Models
Molecular SQL Computer science Relational database Protein Conformation Biomedical Engineering Pharmaceutical Science Medicine (miscellaneous) Bioengineering 02 engineering and technology Oracle Backup Sequence Analysis Protein Electrical and Electronic Engineering Databases Protein computer.programming_language Distributed Computing Environment Information retrieval Distributed database Database schema Computational Biology Proteins 021001 nanoscience & nanotechnology Computer Science Applications Data pre-processing 0210 nano-technology computer Sequence Alignment Biotechnology |
Zdroj: | IEEE transactions on nanobioscience. 19(1) |
ISSN: | 1558-2639 |
Popis: | Exploration of various characteristics of 3D protein structures through querying relational databases storing the structures can be challenging due to the necessity to conform to a particular database schema. However, this also brings several advantages, like the ability to perform extensive database searches with declarative SQL language, protect data against hardware damages through regular backup mechanisms, and secure data against unauthorized access. Since relational databases do not provide exploration methods specific for protein data and its biological semantics, like searches on the basis of protein structural patterns, the use of relational databases in this domain is still rare and requires the development of dedicated methods to increase the speed of data exploration techniques. In this paper, we show a novel data partitioning scheme for distributing data across database clusters that can be used for performing sophisticated explorations of 3D protein structures. The data partitioning scheme relies on protein construction, which requires data preprocessing but results in shorter exploration times through querying federated databases. We solve the problem of finding proteins in Oracle relational database on the basis of the similarity of 3D protein structures with the use of distributed PAR-P3D-SQL queries. Since 3D protein structure similarity searching is one of the most time-consuming exploration processes that can be performed for protein data, we make use of a distributed environment of Oracle federated databases, distributed query processing, and dedicated load balancing methods to accelerate the exploration. Results of performed tests confirm that we are able to significantly increase the speed of the exploration process, proportionally to the number of database nodes in the federated environment. |
Databáze: | OpenAIRE |
Externí odkaz: |