USMI Galaxy Demonstrator (UGD): a collection of tools to integrate microorganisms information

Autor: Paolo Romano, Daniele Pierpaolo Colobraro
Jazyk: angličtina
Rok vydání: 2017
Předmět:
DOI: 10.7287/peerj.preprints.2766
Popis: Due to the fragmentation of microbial information and the several branch of human activities encompassed by microorganism applications, a comprehensive approach for merging information on microbes is needed. Although on line service providers collect several data on microorganisms and provide services for microbial Biological Resource Centres (mBRCs), such services are still limited both in contents and aims. The USMI Galaxy Demonstrator (UGD), an implementation of the Galaxy framework exploiting the XML-based Microbiological Common Language (MCL), is meant to support researchers to make an integrated access to enriched information from microbial catalogues, as well as to help mBRC curators in validating and enriching the contents of their catalogues. Researchers and mBRC curators may exploit the UGD to avoid manual, potentially long, searches on the web and to identify and select microorganisms of interest. UGD tools are written in Python, version 2.7. They allow to enrich the basic information provided by catalogues with related taxonomy, literature, sequence and chemical compound data retrieved from some of the main databases on the basis of the strain number, i.e. the unique identifier for a given culture, and the species names. The data is retrieved by querying database Web Services using either the Simple Object Access Protocol (SOAP) or the Representational State Transfer (REST) access protocols. The MCL format provides a versatile way to archive and exchange data among mBRCs. Galaxy is a well-known, open, web-based platform which offers many tools to retrieve, manage and analyze different kind of information arising from any life science domain. By exploiting Galaxy flexibility,UGD implements some tools and workflows that can be used to find and integrate several information on microorganisms. UGD tools integrate basic information which may support mBRC staff in the insertion of all fundamental strain information in a proper format allowing integration and interoperability with external databases. They also extend the output by adding information on source materials, including species and strain numbers, and retrieve associated microorganisms which use a compound or an enzyme in whatever metabolic pathway by returning the accession number, synonyms, links to external databases, taxon name, and strain number of the requested molecule.
Databáze: OpenAIRE