Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning
release_g5hhis3vajdyznv4wphhrqgaju
by
Weipeng Huang, Xingyi Cheng, Kunlong Chen, Taifeng Wang, Wei Chu
2020
Abstract
The ambiguous annotation criteria lead to divergence of Chinese Word
Segmentation (CWS) datasets in various granularities. Multi-criteria Chinese
word segmentation aims to capture various annotation criteria among datasets
and leverage their common underlying knowledge. In this paper, we propose a
domain adaptive segmenter to exploit diverse criteria of various datasets. Our
model is based on Bidirectional Encoder Representations from Transformers
(BERT), which is responsible for introducing open-domain knowledge. Private and
shared projection layers are proposed to capture domain-specific knowledge and
common knowledge, respectively. We also optimize computational efficiency via
distillation, quantization, and compiler optimization. Experiments show that
our segmenter outperforms the previous state of the art (SOTA) models on 10 CWS
datasets with superior efficiency.
In text/plain
format
Archived Files and Locations
application/pdf 1.1 MB
file_o5tpuhsvjbcfdnk6rie3r7pveq
|
arxiv.org (repository) web.archive.org (webarchive) |
1903.04190v2
access all versions, variants, and formats of this works (eg, pre-prints)