LynyrdSkynyrd at WNUT-2020 Task 2: Semi-Supervised Learning for Identification of Informative COVID-19 English Tweets
Autor: | Kushal Chawla, Abhilasha Sancheti, Gaurav Verma |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
FOS: Computer and information sciences
Coronavirus disease 2019 (COVID-19) Computer science Semi-supervised learning 010501 environmental sciences computer.software_genre 01 natural sciences 050105 experimental psychology Task (project management) Set (abstract data type) Text mining Feature (machine learning) 0501 psychology and cognitive sciences 0105 earth and related environmental sciences Social and Information Networks (cs.SI) Computer Science - Computation and Language business.industry I.2.7 05 social sciences Computer Science - Social and Information Networks Identification (information) Language model Artificial intelligence business Computation and Language (cs.CL) computer Natural language processing |
Zdroj: | W-NUT@EMNLP |
Popis: | In this work, we describe our system for WNUT-2020 shared task on the identification of informative COVID-19 English tweets. Our system is an ensemble of various machine learning methods, leveraging both traditional feature-based classifiers as well as recent advances in pre-trained language models that help in capturing the syntactic, semantic, and contextual features from the tweets. We further employ pseudo-labelling to incorporate the unlabelled Twitter data released on the pandemic. Our best performing model achieves an F1-score of 0.9179 on the provided validation set and 0.8805 on the blind test-set. |
Databáze: | OpenAIRE |
Externí odkaz: |