Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning release_g5hhis3vajdyznv4wphhrqgaju

by Weipeng Huang, Xingyi Cheng, Kunlong Chen, Taifeng Wang, Wei Chu

Released as a article .

2020  

Abstract

The ambiguous annotation criteria lead to divergence of Chinese Word Segmentation (CWS) datasets in various granularities. Multi-criteria Chinese word segmentation aims to capture various annotation criteria among datasets and leverage their common underlying knowledge. In this paper, we propose a domain adaptive segmenter to exploit diverse criteria of various datasets. Our model is based on Bidirectional Encoder Representations from Transformers (BERT), which is responsible for introducing open-domain knowledge. Private and shared projection layers are proposed to capture domain-specific knowledge and common knowledge, respectively. We also optimize computational efficiency via distillation, quantization, and compiler optimization. Experiments show that our segmenter outperforms the previous state of the art (SOTA) models on 10 CWS datasets with superior efficiency.
In text/plain format

Archived Files and Locations

application/pdf  1.1 MB
file_o5tpuhsvjbcfdnk6rie3r7pveq
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2020-10-09
Version   v2
Language   en ?
arXiv  1903.04190v2
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 40364855-d782-4699-ad55-60fde4814d3b
API URL: JSON