K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters
release_jint4qq7evbshpm76ocl25nnm4
by
Ruize Wang, Duyu Tang, Nan Duan, Zhongyu Wei, Xuanjing Huang, Jianshu
ji, Guihong Cao, Daxin Jiang, Ming Zhou
2020
Abstract
We study the problem of injecting knowledge into large pre-trained models
like BERT and RoBERTa. Existing methods typically update the original
parameters of pre-trained models when injecting knowledge. However, when
multiple kinds of knowledge are injected, they may suffer from the problem of
catastrophic forgetting. To address this, we propose K-Adapter, which remains
the original parameters of the pre-trained model fixed and supports continual
knowledge infusion. Taking RoBERTa as the pre-trained model, K-Adapter has a
neural adapter for each kind of infused knowledge, like a plug-in connected to
RoBERTa. There is no information flow between different adapters, thus
different adapters are efficiently trained in a distributed way. We inject two
kinds of knowledge, including factual knowledge obtained from automatically
aligned text-triplets on Wikipedia and Wikidata, and linguistic knowledge
obtained from dependency parsing. Results on three knowledge-driven tasks
(total six datasets) including relation classification, entity typing and
question answering demonstrate that each adapter improves the performance, and
the combination of both adapters brings further improvements. Probing
experiments further show that K-Adapter captures richer factual and commonsense
knowledge than RoBERTa.
In text/plain
format
Archived Files and Locations
application/pdf 1.2 MB
file_karhz2nlbvfmrjdpzegr6blf5e
|
arxiv.org (repository) web.archive.org (webarchive) |
2002.01808v3
access all versions, variants, and formats of this works (eg, pre-prints)