Improving Multilingual Named Entity Recognition with Wikipedia Entity
Type Mapping
release_ok275x7b7jgwrpxyx4m73x764y
by
Jian Ni, Radu Florian
2017
Abstract
The state-of-the-art named entity recognition (NER) systems are statistical
machine learning models that have strong generalization capability (i.e., can
recognize unseen entities that do not appear in training data) based on lexical
and contextual information. However, such a model could still make mistakes if
its features favor a wrong entity type. In this paper, we utilize Wikipedia as
an open knowledge base to improve multilingual NER systems. Central to our
approach is the construction of high-accuracy, high-coverage multilingual
Wikipedia entity type mappings. These mappings are built from weakly annotated
data and can be extended to new languages with no human annotation or
language-dependent knowledge involved. Based on these mappings, we develop
several approaches to improve an NER system. We evaluate the performance of the
approaches via experiments on NER systems trained for 6 languages. Experimental
results show that the proposed approaches are effective in improving the
accuracy of such systems on unseen entities, especially when a system is
applied to a new domain or it is trained with little training data (up to 18.3
F1 score improvement).
In text/plain
format
Archived Files and Locations
application/pdf 129.5 kB
file_iissxpphuvhcnp42kkk2rvudki
|
arxiv.org (repository) web.archive.org (webarchive) |
1707.02459v1
access all versions, variants, and formats of this works (eg, pre-prints)