DiaLex: A Benchmark for Evaluating Multidialectal Arabic Word Embeddings

Autor: Abdul-Mageed, Muhammad, Elbassuoni, Shady, Doughman, Jad, Elmadany, AbdelRahim, Nagoudi, El Moatez Billah, Zoughby, Yorgo, Shaher, Ahmad, Gaba, Iskander, Helal, Ahmed, El-Razzaz, Mohammed
Rok vydání: 2020
Předmět:
Druh dokumentu: Working Paper
Popis: Word embeddings are a core component of modern natural language processing systems, making the ability to thoroughly evaluate them a vital task. We describe DiaLex, a benchmark for intrinsic evaluation of dialectal Arabic word embedding. DiaLex covers five important Arabic dialects: Algerian, Egyptian, Lebanese, Syrian, and Tunisian. Across these dialects, DiaLex provides a testbank for six syntactic and semantic relations, namely male to female, singular to dual, singular to plural, antonym, comparative, and genitive to past tense. DiaLex thus consists of a collection of word pairs representing each of the six relations in each of the five dialects. To demonstrate the utility of DiaLex, we use it to evaluate a set of existing and new Arabic word embeddings that we developed. Our benchmark, evaluation code, and new word embedding models will be publicly available.
Comment: WANLP2021
Databáze: arXiv