KLUE: Korean Language Understanding Evaluation
release_2mpos453djepnedlfgxwlrrunm
by
Sungjoon Park, Jihyung Moon, Sungdong Kim, Won Ik Cho, Jiyoon Han, Jangwon Park, Chisung Song, Junseong Kim, Yongsook Song, Taehwan Oh, Joohong Lee, Juhyun Oh (+19 others)
2021
Abstract
We introduce Korean Language Understanding Evaluation (KLUE) benchmark. KLUE
is a collection of 8 Korean natural language understanding (NLU) tasks,
including Topic Classification, SemanticTextual Similarity, Natural Language
Inference, Named Entity Recognition, Relation Extraction, Dependency Parsing,
Machine Reading Comprehension, and Dialogue State Tracking. We build all of the
tasks from scratch from diverse source corpora while respecting copyrights, to
ensure accessibility for anyone without any restrictions. With ethical
considerations in mind, we carefully design annotation protocols. Along with
the benchmark tasks and data, we provide suitable evaluation metrics and
fine-tuning recipes for pretrained language models for each task. We
furthermore release the pretrained language models (PLM), KLUE-BERT and
KLUE-RoBERTa, to help reproducing baseline models on KLUE and thereby
facilitate future research. We make a few interesting observations from the
preliminary experiments using the proposed KLUE benchmark suite, already
demonstrating the usefulness of this new benchmark suite. First, we find
KLUE-RoBERTa-large outperforms other baselines, including multilingual PLMs and
existing open-source Korean PLMs. Second, we see minimal degradation in
performance even when we replace personally identifiable information from the
pretraining corpus, suggesting that privacy and NLU capability are not at odds
with each other. Lastly, we find that using BPE tokenization in combination
with morpheme-level pre-tokenization is effective in tasks involving
morpheme-level tagging, detection and generation. In addition to accelerating
Korean NLP research, our comprehensive documentation on creating KLUE will
facilitate creating similar resources for other languages in the future. KLUE
is available at <a class="link-external link-https"
href="https://klue-benchmark.com/">this URL</a>.
In text/plain
format
Archived Files and Locations
application/pdf 3.9 MB
file_juydsweppncr3pwiam4joklshq
|
arxiv.org (repository) web.archive.org (webarchive) |
2105.09680v2
access all versions, variants, and formats of this works (eg, pre-prints)