Knowledge-Empowered Representation Learning for Chinese Medical Reading Comprehension: Task, Model and Resources
release_djbvgzqdszdtzcucimcg6k45ga
by
Taolin Zhang, Chengyu Wang, Minghui Qiu, Bite Yang, Xiaofeng He, Jun Huang
2020
Abstract
Machine Reading Comprehension (MRC) aims to extract answers to questions
given a passage. It has been widely studied recently, especially in open
domains. However, few efforts have been made on closed-domain MRC, mainly due
to the lack of large-scale training data. In this paper, we introduce a
multi-target MRC task for the medical domain, whose goal is to predict answers
to medical questions and the corresponding support sentences from medical
information sources simultaneously, in order to ensure the high reliability of
medical knowledge serving. A high-quality dataset is manually constructed for
the purpose, named Multi-task Chinese Medical MRC dataset (CMedMRC), with
detailed analysis conducted. We further propose the Chinese medical BERT model
for the task (CMedBERT), which fuses medical knowledge into pre-trained
language models by the dynamic fusion mechanism of heterogeneous features and
the multi-task learning strategy. Experiments show that CMedBERT consistently
outperforms strong baselines by fusing context-aware and knowledge-aware token
representations.
In text/plain
format
Archived Files and Locations
application/pdf 1.1 MB
file_uajwqridejbvldyor3ix2rvloa
|
arxiv.org (repository) web.archive.org (webarchive) |
2008.10327v1
access all versions, variants, and formats of this works (eg, pre-prints)