FAIRification of genomic track metadata [version 1

Autor: Gundersen, Sveinung, Boddu, Sanjay, Capella Gutiérrez, Salvador|||0000-0002-0309-604X, Drabløs, Finn, Fernández González, Jose María|||0000-0002-4806-5140, Kompova, Radmila, Taylor, Kieron, Titov, Dmytro, Zerbino, Daniel, Hovig, Eivind
Jazyk: angličtina
Rok vydání: 2021
Předmět:
Zdroj: UPCommons. Portal del coneixement obert de la UPC
Universitat Politècnica de Catalunya (UPC)
Popis: Background: Many types of data from genomic analyses can be represented as genomic tracks, i.e. features linked to the genomic coordinates of a reference genome. Examples of such data are epigenetic DNA methylation data, ChIP-seq peaks, germline or somatic DNA variants, as well as RNA-seq expression levels. Researchers often face difficulties in locating, accessing and combining relevant tracks from external sources, as well as locating the raw data, reducing the value of the generated information. Description of work: We propose to advance the application of FAIR data principles (Findable, Accessible, Interoperable, and Reusable) to produce searchable metadata for genomic tracks. Findability and Accessibility of metadata can then be ensured by a track search service that integrates globally identifiable metadata from various track hubs in the Track Hub Registry and other relevant repositories. Interoperability and Reusability need to be ensured by the specification and implementation of a basic set of recommendations for metadata. We have tested this concept by developing such a specification in a JSON Schema, called FAIRtracks, and have integrated it into a novel track search service, called TrackFind. We demonstrate practical usage by importing datasets through TrackFind into existing examples of relevant analytical tools for genomic tracks: EPICO and the GSuite HyperBrowser. Conclusion: We here provide a first iteration of a draft standard for genomic track metadata, as well as the accompanying software ecosystem. It can easily be adapted or extended to future needs of the research community regarding data, methods and tools, balancing the requirements of both data submitters and analytical end-users The work was funded by ELIXIR through the ELIXIR Implementation Study: FAIRification of genomic tracks, and through ELIXIR Norway, ELIXIR Spain and EMBL-EBI core funding. SC-G and JMF received funding through the INB Grant (PT17/0009/0001 - ISCIII-SGEFI / ERDF). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Databáze: OpenAIRE