DiaLex: A Benchmark for Evaluating Multidialectal Arabic Word Embeddings
release_yho54xaewfb7xm64gnuwj4hoba
by
Muhammad Abdul-Mageed, Shady Elbassuoni, Jad Doughman, AbdelRahim Elmadany, El Moatez Billah Nagoudi, Yorgo Zoughby, Ahmad Shaher Iskander Gaba, Ahmed Helal, Mohammed El-Razzaz
2020
Abstract
Word embeddings are a core component of modern natural language processing
systems, making the ability to thoroughly evaluate them a vital task. We
describe DiaLex, a benchmark for intrinsic evaluation of dialectal Arabic word
embedding. DiaLex covers five important Arabic dialects: Algerian, Egyptian,
Lebanese, Syrian, and Tunisian. Across these dialects, DiaLex provides a
testbank for six syntactic and semantic relations, namely male to female,
singular to dual, singular to plural, antonym, comparative, and genitive to
past tense. DiaLex thus consists of a collection of word pairs representing
each of the six relations in each of the five dialects. To demonstrate the
utility of DiaLex, we use it to evaluate a set of existing and new Arabic word
embeddings that we developed. Our benchmark, evaluation code, and new word
embedding models will be publicly available.
In text/plain
format
Archived Files and Locations
application/pdf 587.3 kB
file_jxa7rfgphvfudc5flz43oqtnum
|
arxiv.org (repository) web.archive.org (webarchive) |
2011.10970v1
access all versions, variants, and formats of this works (eg, pre-prints)