Faking Fake News for Real Fake News Detection: Propaganda-loaded Training Data Generation release_pa7j7mhecbburjcinj7nccpkju

by Kung-Hsiang Huang, Kathleen McKeown, Preslav Nakov, Yejin Choi, Heng Ji

Released as a article .

2022  

Abstract

While there has been a lot of research and many recent advances in neural fake news detection, defending against human-written disinformation remains underexplored. Upon analyzing current approaches for fake news generation and human-crafted articles, we found that there is a gap between them, which can explain the poor performance on detecting human-written fake news for detectors trained on automatically generated data. To address this issue, we propose a novel framework for generating articles closer to human-written ones. Specifically, we perform self-critical sequence training with natural language inference to ensure the validity of the generated articles. We then explicitly incorporate propaganda techniques into the generated articles to mimic how humans craft fake news. Eventually, we create a fake news detection training dataset, PropaNews, which includes 2,256 examples. Our experimental results show that detectors trained on PropaNews are 7.3% to 12.0% more accurate for detecting human-written disinformation than for counterparts trained on data generated by state-of-the-art approaches.
In text/plain format

Archived Files and Locations

application/pdf  532.6 kB
file_dfbqi3m42nh4npixcwnz3fwsn4
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2022-03-10
Version   v1
Language   en ?
arXiv  2203.05386v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 8167a2da-6958-4855-8864-14d6ed9370be
API URL: JSON