Faking Fake News for Real Fake News Detection: Propaganda-loaded Training Data Generation
release_pa7j7mhecbburjcinj7nccpkju
by
Kung-Hsiang Huang, Kathleen McKeown, Preslav Nakov, Yejin Choi, Heng Ji
2022
Abstract
While there has been a lot of research and many recent advances in neural
fake news detection, defending against human-written disinformation remains
underexplored. Upon analyzing current approaches for fake news generation and
human-crafted articles, we found that there is a gap between them, which can
explain the poor performance on detecting human-written fake news for detectors
trained on automatically generated data. To address this issue, we propose a
novel framework for generating articles closer to human-written ones.
Specifically, we perform self-critical sequence training with natural language
inference to ensure the validity of the generated articles. We then explicitly
incorporate propaganda techniques into the generated articles to mimic how
humans craft fake news. Eventually, we create a fake news detection training
dataset, PropaNews, which includes 2,256 examples. Our experimental results
show that detectors trained on PropaNews are 7.3% to 12.0% more accurate for
detecting human-written disinformation than for counterparts trained on data
generated by state-of-the-art approaches.
In text/plain
format
Archived Files and Locations
application/pdf 532.6 kB
file_dfbqi3m42nh4npixcwnz3fwsn4
|
arxiv.org (repository) web.archive.org (webarchive) |
2203.05386v1
access all versions, variants, and formats of this works (eg, pre-prints)