The Russian Drug Reaction Corpus and Neural Models for Drug Reactions and Effectiveness Detection in User Reviews
Autor: | Sergey I. Nikolenko, Elena Tutubalina, Andrey Sakhovskiy, Ilseyar Alimova, Valentin Malykh, Zulfat Miftahutdinov |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
Statistics and Probability
Drug FOS: Computer and information sciences Drug-Related Side Effects and Adverse Reactions Computer science media_common.quotation_subject MEDLINE computer.software_genre Biochemistry Task (project management) Russia 03 medical and health sciences 0302 clinical medicine Pharmacotherapy Named-entity recognition Data Mining Humans Social media 030212 general & internal medicine Drug reaction Molecular Biology 030304 developmental biology media_common Language 0303 health sciences Computer Science - Computation and Language business.industry Computer Science Applications Computational Mathematics Information extraction Identification (information) Computational Theory and Mathematics Pharmaceutical Preparations The Internet Artificial intelligence business computer Computation and Language (cs.CL) Natural language processing Sentence |
Popis: | The Russian Drug Reaction Corpus (RuDReC) is a new partially annotated corpus of consumer reviews in Russian about pharmaceutical products for the detection of health-related named entities and the effectiveness of pharmaceutical products. The corpus itself consists of two parts, the raw one and the labelled one. The raw part includes 1.4 million health-related user-generated texts collected from various Internet sources, including social media. The labelled part contains 500 consumer reviews about drug therapy with drug- and disease-related information. Labels for sentences include health-related issues or their absence. The sentences with one are additionally labelled at the expression level for identification of fine-grained subtypes such as drug classes and drug forms, drug indications, and drug reactions. Further, we present a baseline model for named entity recognition (NER) and multi-label sentence classification tasks on this corpus. The macro F1 score of 74.85% in the NER task was achieved by our RuDR-BERT model. For the sentence classification task, our model achieves the macro F1 score of 68.82% gaining 7.47% over the score of BERT model trained on Russian data. We make the RuDReC corpus and pretrained weights of domain-specific BERT models freely available at https://github.com/cimm-kzn/RuDReC 9 pages, 9 tables, 4 figures |
Databáze: | OpenAIRE |
Externí odkaz: |