TWAG: A Topic-Guided Wikipedia Abstract Generator release_gr75lnsp7jhhtfvxw3dzidoqcu

by Fangwei Zhu, Shangqing Tu, Jiaxin Shi, Juanzi Li, Lei Hou, Tong Cui

Released as a article .

2021  

Abstract

Wikipedia abstract generation aims to distill a Wikipedia abstract from web sources and has met significant success by adopting multi-document summarization techniques. However, previous works generally view the abstract as plain text, ignoring the fact that it is a description of a certain entity and can be decomposed into different topics. In this paper, we propose a two-stage model TWAG that guides the abstract generation with topical information. First, we detect the topic of each input paragraph with a classifier trained on existing Wikipedia articles to divide input documents into different topics. Then, we predict the topic distribution of each abstract sentence, and decode the sentence from topic-aware representations with a Pointer-Generator network. We evaluate our model on the WikiCatSum dataset, and the results show that outperforms various existing baselines and is capable of generating comprehensive abstracts. Our code and dataset can be accessed at <https://github.com/THU-KEG/TWAG>
In text/plain format

Archived Files and Locations

application/pdf  826.6 kB
file_uptwrf22h5hjrbtw4gxxpnkx4u
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2021-06-29
Version   v1
Language   en ?
arXiv  2106.15135v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: d882e35d-0f08-4568-99db-2a900234a185
API URL: JSON