Exploring Amharic Sentiment Analysis from Social Media Texts: Building Annotation Tools and Classification Models
Autor: | Hizkiel Mitiku Alemayehu, Abinew Ali Ayele, Seid Muhie Yimam, Chris Biemann |
---|---|
Rok vydání: | 2020 |
Předmět: |
Sarcasm
business.industry Computer science Deep learning media_common.quotation_subject Sentiment analysis Text annotation 02 engineering and technology computer.software_genre Crowdsourcing language.human_language Annotation Amharic 020204 information systems Classifier (linguistics) 0202 electrical engineering electronic engineering information engineering language 020201 artificial intelligence & image processing Social media Artificial intelligence business computer Natural language processing media_common |
Zdroj: | COLING |
DOI: | 10.18653/v1/2020.coling-main.91 |
Popis: | This paper presents the study of sentiment analysis for Amharic social media texts. As the number of social media users is ever-increasing, social media platforms would like to understand the latent meaning and sentiments of a text to enhance decision-making procedures. However, low-resource languages such as Amharic have received less attention due to several reasons such as lack of well-annotated datasets, unavailability of computing resources, and fewer or no expert researchers in the area. This research addresses three main research questions. We first explore the suitability of existing tools for the sentiment analysis task. Annotation tools are scarce to support large-scale annotation tasks in Amharic. Also, the existing crowdsourcing platforms do not support Amharic text annotation. Hence, we build a social-network-friendly annotation tool called ‘ASAB’ using the Telegram bot. We collect 9.4k tweets, where each tweet is annotated by three Telegram users. Moreover, we explore the suitability of machine learning approaches for Amharic sentiment analysis. The FLAIR deep learning text classifier, based on network embeddings that are computed from a distributional thesaurus, outperforms other supervised classifiers. We further investigate the challenges in building a sentiment analysis system for Amharic and we found that the widespread usage of sarcasm and figurative speech are the main issues in dealing with the problem. To advance the sentiment analysis research in Amharic and other related low-resource languages, we release the dataset, the annotation tool, source code, and models publicly under a permissive. |
Databáze: | OpenAIRE |
Externí odkaz: |