TWAG: A Topic-Guided Wikipedia Abstract Generator
release_gr75lnsp7jhhtfvxw3dzidoqcu
by
Fangwei Zhu, Shangqing Tu, Jiaxin Shi, Juanzi Li, Lei Hou, Tong Cui
2021
Abstract
Wikipedia abstract generation aims to distill a Wikipedia abstract from web
sources and has met significant success by adopting multi-document
summarization techniques. However, previous works generally view the abstract
as plain text, ignoring the fact that it is a description of a certain entity
and can be decomposed into different topics. In this paper, we propose a
two-stage model TWAG that guides the abstract generation with topical
information. First, we detect the topic of each input paragraph with a
classifier trained on existing Wikipedia articles to divide input documents
into different topics. Then, we predict the topic distribution of each abstract
sentence, and decode the sentence from topic-aware representations with a
Pointer-Generator network. We evaluate our model on the WikiCatSum dataset, and
the results show that outperforms various existing baselines and is
capable of generating comprehensive abstracts. Our code and dataset can be
accessed at <https://github.com/THU-KEG/TWAG>
In text/plain
format
Archived Files and Locations
application/pdf 826.6 kB
file_uptwrf22h5hjrbtw4gxxpnkx4u
|
arxiv.org (repository) web.archive.org (webarchive) |
2106.15135v1
access all versions, variants, and formats of this works (eg, pre-prints)