Drink2Vec: Improving the classification of alcohol-related tweets using distributional semantics and external contextual enrichment
Autor: | Renata Galante, Karin Becker, Marcos A. Grzeça |
---|---|
Rok vydání: | 2020 |
Předmět: |
Vocabulary
Word embedding Computer science Generalization media_common.quotation_subject 030508 substance abuse Alcohol abuse 02 engineering and technology Library and Information Sciences Management Science and Operations Research computer.software_genre Convolutional neural network 03 medical and health sciences 0202 electrical engineering electronic engineering information engineering Media Technology medicine Semantic Web media_common business.industry medicine.disease Computer Science Applications Statistical classification 020201 artificial intelligence & image processing Artificial intelligence Distributional semantics 0305 other medical science business computer Natural language processing Information Systems |
Zdroj: | Information Processing & Management. 57:102369 |
ISSN: | 0306-4573 |
Popis: | The hazardous and harmful use of alcohol has become a public health issue worldwide. Social media has emerged as a reliable source to extract information on alcohol consumption at low cost and latency. The automatic classification of tweets related to alcohol consumption can help to understand the factors related to alcohol abuse. In this paper, we propose Drink2Vec, a method aimed at improving the classification of alcohol-related tweets by exploring two forms of contextual information: distributional semantics and external contextual enrichment. The core of Drink2Vec is a convolutional neural network that learns domain-specific word embedding representations that capture vocabulary related to alcohol consumption. Drink2Vec builds on Drink2Symbol, a method that finds relevant symbolic features on external sources (e.g., Semantic Web) to provide meaning and generalization to the terms present in tweets. Based on five datasets and three classification algorithms, our experiments show that external enrichment improves the recall by the addition of generalization features, while distributional semantics improves the precision mainly by characterizing terms according to their usage. A stacking ensemble of these classifiers establishes a proper balance between the advantages of each contextual enrichment technique. Our experiments also suggest that the task-specific embeddings produced by Drink2Vec capture more nuances of the informal vocabulary related to alcohol consumption (e.g., slangs, events, misspelled words) and yield better results compared to other strategies (e.g., pre-trained embeddings and generic algorithms). |
Databáze: | OpenAIRE |
Externí odkaz: |