Simple Questions Generate Named Entity Recognition Datasets release_2lmu2tgkufdtjhivtykmbft2j4

by Hyunjae Kim, Jaehyo Yoo, Seunghyun Yoon, Jinhyuk Lee, Jaewoo Kang

Released as a article .

2021  

Abstract

Named entity recognition (NER) is a task of extracting named entities of specific types from text. Current NER models often rely on human-annotated datasets requiring the vast engagement of professional knowledge on the target domain and entities. This work introduces an ask-to-generate approach, which automatically generates NER datasets by asking simple natural language questions that reflect the needs for entity types (e.g., Which disease?) to an open-domain question answering system. Without using any in-domain resources (i.e., training sentences, labels, or in-domain dictionaries), our models solely trained on our generated datasets largely outperform previous weakly supervised models on six NER benchmarks across four different domains. Surprisingly, on NCBI-disease, our model achieves 75.5 F1 score and even outperforms the previous best weakly supervised model by 4.1 F1 score, which utilizes a rich in-domain dictionary provided by domain experts. Formulating the needs of NER with natural language also allows us to build NER models for fine-grained entity types such as Award, where our model even outperforms fully supervised models. On three few-shot NER benchmarks, our model achieves new state-of-the-art performance.
In text/plain format

Archived Content

There are no accessible files associated with this release. You could check other releases for this work for an accessible version.

"Dark" Preservation Only
Save Paper Now!

Know of a fulltext copy of on the public web? Submit a URL and we will archive it

Type  article
Stage   submitted
Date   2021-12-16
Version   v1
Language   en ?
arXiv  2112.08808v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: b3251e70-1de7-4f49-8164-ba64b282572b
API URL: JSON