DrugProt Large-Scale Text Mining corpus: Biocreative VII Track 1 - Text mining drug and chemical-protein interactions

Autor: Miranda-Escalada, Antonio, Jouni Luoma, Farrokh Mehryary, Sampo Pyysalo, Krallinger, Martin
Jazyk: angličtina
Rok vydání: 2021
Předmět:
DOI: 10.5281/zenodo.5119878
Popis: This Zenodo contains the BioCreative VII Large scale DrugProt Additional Subtrackabstracts and entity annotations. Abstracts large_scale_abstracts.tsvThis filecontains plain-­text, UTF8-­encoded, NFC normalized DrugProt PubMed records in a tab­ ‐ separated format.In total 2366081 records are provided, where each line in the fails contains a single PMID, title and abstract separated by tabulators. Due to PubMed inconsistencies, there is a minor percentage of duplicated records. Indeed, we have identified 222 records with different PMID but the same abstract title and body. Entity mention annotations large_scale_entities.tsv.This filecontains the automatically labeled mention annotations of chemical compounds and genes/proteins (so-­called gene and protein-related objects as defined during BioCreative V) generated for the Large Scale records.There are 53993602 entity annotations. Related resources: Web DrugProt corpus Evaluation library Online evaluation (CodaLab) Relation annotation guidelines Gene and protein annotation guidelines Chemicals and drugs annotation guidelines DrugProt Silver Standard Knowledge Graph FAQ DrugProt Large Scale Additional SubTrack DrugProt Large Scale document collection protocol  
Databáze: OpenAIRE