Developing Resources For Sentiment Analysis Of Informal Arabic Text In Social Media

Autor:	Samir Al-Khayatt, Maher Itani, Chris Roast
Rok vydání:	2017
Předmět:	020203 distributed computing Machine translation Arabic Process (engineering) business.industry Computer science media_common.quotation_subject Sentiment analysis 02 engineering and technology computer.software_genre language.human_language Spelling 0202 electrical engineering electronic engineering information engineering language General Earth and Planetary Sciences 020201 artificial intelligence & image processing Quality (business) Social media Artificial intelligence business computer Natural language processing General Environmental Science media_common
Zdroj:	ACLING
ISSN:	1877-0509
DOI:	10.1016/j.procs.2017.10.101
Popis:	Natural Language Processing (NLP) applications such as text categorization, machine translation, sentiment analysis, etc., need annotated corpora and lexicons to check quality and performance. This paper describes the development of resources for sentiment analysis specifically for Arabic text in social media. A distinctive feature of the corpora and lexicons developed are that they are determined from informal Arabic that does not conform to grammatical or spelling standards. We refer to Arabic social media content of this sort as Dialectal Arabic (DA) - informal Arabic originating from and potentially mixing a range of different individual dialects. The paper describes the process adopted for developing corpora and sentiment lexicons for sentiment analysis within different social media and their resulting characteristics. The addition to providing useful NLP data sets for Dialectal Arabic the work also contributes to understanding the approach to developing corpora and lexicons.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e95a9567a0a1474b63eec82bafdec964 https://doi.org/10.1016/j.procs.2017.10.101 Zobrazit plný text záznamu