DiaLex: A Benchmark for Evaluating Multidialectal Arabic Word Embeddings release_yho54xaewfb7xm64gnuwj4hoba

by Muhammad Abdul-Mageed, Shady Elbassuoni, Jad Doughman, AbdelRahim Elmadany, El Moatez Billah Nagoudi, Yorgo Zoughby, Ahmad Shaher Iskander Gaba, Ahmed Helal, Mohammed El-Razzaz

Released as a article .

2020  

Abstract

Word embeddings are a core component of modern natural language processing systems, making the ability to thoroughly evaluate them a vital task. We describe DiaLex, a benchmark for intrinsic evaluation of dialectal Arabic word embedding. DiaLex covers five important Arabic dialects: Algerian, Egyptian, Lebanese, Syrian, and Tunisian. Across these dialects, DiaLex provides a testbank for six syntactic and semantic relations, namely male to female, singular to dual, singular to plural, antonym, comparative, and genitive to past tense. DiaLex thus consists of a collection of word pairs representing each of the six relations in each of the five dialects. To demonstrate the utility of DiaLex, we use it to evaluate a set of existing and new Arabic word embeddings that we developed. Our benchmark, evaluation code, and new word embedding models will be publicly available.
In text/plain format

Archived Files and Locations

application/pdf  587.3 kB
file_jxa7rfgphvfudc5flz43oqtnum
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2020-11-22
Version   v1
Language   en ?
arXiv  2011.10970v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: cc824b7c-b205-4023-9712-b3bab35fda6e
API URL: JSON