Improving Multilingual Named Entity Recognition with Wikipedia Entity Type Mapping release_ok275x7b7jgwrpxyx4m73x764y

by Jian Ni, Radu Florian

Released as a article .

2017  

Abstract

The state-of-the-art named entity recognition (NER) systems are statistical machine learning models that have strong generalization capability (i.e., can recognize unseen entities that do not appear in training data) based on lexical and contextual information. However, such a model could still make mistakes if its features favor a wrong entity type. In this paper, we utilize Wikipedia as an open knowledge base to improve multilingual NER systems. Central to our approach is the construction of high-accuracy, high-coverage multilingual Wikipedia entity type mappings. These mappings are built from weakly annotated data and can be extended to new languages with no human annotation or language-dependent knowledge involved. Based on these mappings, we develop several approaches to improve an NER system. We evaluate the performance of the approaches via experiments on NER systems trained for 6 languages. Experimental results show that the proposed approaches are effective in improving the accuracy of such systems on unseen entities, especially when a system is applied to a new domain or it is trained with little training data (up to 18.3 F1 score improvement).
In text/plain format

Archived Files and Locations

application/pdf  129.5 kB
file_iissxpphuvhcnp42kkk2rvudki
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2017-07-08
Version   v1
Language   en ?
arXiv  1707.02459v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: f8941dd9-1783-4a9d-9ec1-2a9889f9f34d
API URL: JSON