A SNPshot of PubMed to associate genetic variants with drugs, diseases, and adverse reactions

Autor: Dmitry Voronov, Robert Leaman, Shanshan Liang, Barry Lumpkin, Chitta Baral, Võ Ha Nguyên, Luis Tari, S. Anwar, Jörg Hakenberg
Rok vydání: 2012
Předmět:
Zdroj: Journal of Biomedical Informatics
ISSN: 1532-0464
DOI: 10.1016/j.jbi.2012.04.006
Popis: Graphical abstractExcerpt from the data sheet for the human epidermal growth factor receptor, EGFR, showing a summary of information on the gene, lists of predicted related entities, and examples for a genetic variant and disease association. View the entire entry at http://bioai4core.fulton.asu.edu/snpshot/FactSheet?id=1956&type=GENE.Display Omitted Highlights? SNPshot is an automatically populated repository on genetic variants, relations to drugs, and associations with disease. ? Our methods yield a precision of 90-92% for the major entity types, and 76-84% for relations. ? SNPshot data covers PharmGKB and other repositories to more than 90based on 180,000 PubMed abstracts. ? SNPShot is publicly available for search and download at http://bioai4core.fulton.asu.edu/snpshot. MotivationGenetic factors determine differences in pharmacokinetics, drug efficacy, and drug responses between individuals and sub-populations. Wrong dosages of drugs can lead to severe adverse drug reactions in individuals whose drug metabolism drastically differs from the "assumed average". Databases such as PharmGKB are excellent sources of pharmacogenetic information on enzymes, genetic variants, and drug response affected by changes in enzymatic activity. Here, we seek to aid researchers, database curators, and clinicians in their search for relevant information by automatically extracting these data from literature. ApproachWe automatically populate a repository of information on genetic variants, relations to drugs, occurrence in sub-populations, and associations with disease. We mine textual data from PubMed abstracts to discover such genotype-phenotype associations, focusing on SNPs that can be associated with variations in drug response. The overall repository covers relations found between genes, variants, alleles, drugs, diseases, adverse drug reactions, populations, and allele frequencies. We cross-reference these data to EntrezGene, PharmGKB, PubChem, and others. ResultsThe performance regarding entity recognition and relation extraction yields a precision of 90-92% for the major entity types (gene, drug, disease), and 76-84% for relations involving these types. Comparison of our repository to PharmGKB reveals a coverage of 93% of gene-drug associations in PharmGKB and 97% of the gene-variant mappings based on 180,000 PubMed abstracts. Availabilityhttp://bioai4core.fulton.asu.edu/snpshot.
Databáze: OpenAIRE