Building a large gene expression-cancer knowledge base with limited human annotations.

Autor: Marchesin S; Department of Information Engineering, University of Padova, Via G. Gradenigo 6b, Padova 35131, Italy., Menotti L; Department of Information Engineering, University of Padova, Via G. Gradenigo 6b, Padova 35131, Italy., Giachelle F; Department of Information Engineering, University of Padova, Via G. Gradenigo 6b, Padova 35131, Italy., Silvello G; Department of Information Engineering, University of Padova, Via G. Gradenigo 6b, Padova 35131, Italy., Alonso O; Applied Science, Amazon, 3075 Olcott St., Santa Clara, California 95054, USA.
Jazyk: angličtina
Zdroj: Database : the journal of biological databases and curation [Database (Oxford)] 2023 Sep 27; Vol. 2023.
DOI: 10.1093/database/baad061
Abstrakt: Cancer prevention is one of the most pressing challenges that public health needs to face. In this regard, data-driven research is central to assist medical solutions targeting cancer. To fully harness the power of data-driven research, it is imperative to have well-organized machine-readable facts into a knowledge base (KB). Motivated by this urgent need, we introduce the Collaborative Oriented Relation Extraction (CORE) system for building KBs with limited manual annotations. CORE is based on the combination of distant supervision and active learning paradigms and offers a seamless, transparent, modular architecture equipped for large-scale processing. We focus on precision medicine and build the largest KB on 'fine-grained' gene expression-cancer associations-a key to complement and validate experimental data for cancer research. We show the robustness of CORE and discuss the usefulness of the provided KB. Database URL https://zenodo.org/record/7577127.
(© The Author(s) 2023. Published by Oxford University Press.)
Databáze: MEDLINE