BERT based Transformers lead the way in Extraction of Health Information from Social Media
Autor: | Nishesh Singh, Ujjwal Verma, Parthivi Choubey, Abhiraj Tiwari, Sidharth Ramesh, Sahil Khose, Saisha Kashyap, Kumud Lakara |
---|---|
Rok vydání: | 2021 |
Předmět: |
Normalization (statistics)
Social and Information Networks (cs.SI) FOS: Computer and information sciences Computer Science - Computation and Language business.industry Computer science Computer Science - Social and Information Networks computer.software_genre Pipeline (software) Task (project management) Binary classification Social media mining Social media Artificial intelligence business F1 score computer Computation and Language (cs.CL) Natural language processing Transformer (machine learning model) |
DOI: | 10.48550/arxiv.2104.07367 |
Popis: | This paper describes our submissions for the Social Media Mining for Health (SMM4H)2021 shared tasks. We participated in 2 tasks:(1) Classification, extraction and normalization of adverse drug effect (ADE) mentions in English tweets (Task-1) and (2) Classification of COVID-19 tweets containing symptoms(Task-6). Our approach for the first task uses the language representation model RoBERTa with a binary classification head. For the second task, we use BERTweet, based on RoBERTa. Fine-tuning is performed on the pre-trained models for both tasks. The models are placed on top of a custom domain-specific processing pipeline. Our system ranked first among all the submissions for subtask-1(a) with an F1-score of 61%. For subtask-1(b), our system obtained an F1-score of 50% with improvements up to +8% F1 over the score averaged across all submissions. The BERTweet model achieved an F1 score of 94% on SMM4H 2021 Task-6. Comment: 6 pages, 1 figure |
Databáze: | OpenAIRE |
Externí odkaz: |