ArWordVec: efficient word embedding models for Arabic tweets

Autor:	Naif Radi Aljohani, Mohammed M. Fouad, Saeed-Ul Hassan, Rabeeh Ayaz Abbasi, Ahmed Mahany
Rok vydání:	2019
Předmět:	0209 industrial biotechnology Word embedding Arabic Computer science Computational intelligence 02 engineering and technology computer.software_genre Theoretical Computer Science Set (abstract data type) 020901 industrial engineering & automation Similarity (psychology) 0202 electrical engineering electronic engineering information engineering business.industry Deep learning Sentiment analysis language.human_language language 020201 artificial intelligence & image processing Geometry and Topology Artificial intelligence business computer Software Natural language Natural language processing Word (computer architecture)
Zdroj:	Soft Computing. 24:8061-8068
ISSN:	1433-7479 1432-7643
DOI:	10.1007/s00500-019-04153-6
Popis:	One of the major advances in artificial intelligence nowadays is to understand, process and utilize the humans’ natural language. This has been achieved by employing the different natural language processing (NLP) techniques along with the aid of the various deep learning approaches and architectures. Using the distributed word representations to substitute the traditional bag-of-words approach has been utilized very efficiently in the last years for many NLP tasks. In this paper, we present the detailed steps of building a set of efficient word embedding models called ArWordVec that are generated from a huge repository of Arabic tweets. In addition, a new method for measuring Arabic word similarity is introduced that has been used in evaluating the performance of the generated ArWordVec models. The experimental results show that the performance of the ArWordVec models overcomes the recently available models on Arabic Twitter data for the word similarity task. In addition, two of the large Arabic tweets datasets are used to examine the performance of the proposed models in the multi-class sentiment analysis task. The results show that the proposed models are very efficient and help in achieving a classification accuracy ratio exceeding 73.86% with a high average F1 value of 74.15.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::4ef6f26a580b234719cceaaeeddf0e41 https://doi.org/10.1007/s00500-019-04153-6 Zobrazit plný text záznamu Full text from SpringerLink