Rethinking the Backdoor Attacks' Triggers: A Frequency Perspective
release_xogb3o6gbnefjl2b3cgjbl4ccu
by
Yi Zeng, Won Park, Z. Morley Mao, Ruoxi Jia
2022
Abstract
Backdoor attacks have been considered a severe security threat to deep
learning. Such attacks can make models perform abnormally on inputs with
predefined triggers and still retain state-of-the-art performance on clean
data. While backdoor attacks have been thoroughly investigated in the image
domain from both attackers' and defenders' sides, an analysis in the frequency
domain has been missing thus far.
This paper first revisits existing backdoor triggers from a frequency
perspective and performs a comprehensive analysis. Our results show that many
current backdoor attacks exhibit severe high-frequency artifacts, which persist
across different datasets and resolutions. We further demonstrate these
high-frequency artifacts enable a simple way to detect existing backdoor
triggers at a detection rate of 98.50% without prior knowledge of the attack
details and the target model. Acknowledging previous attacks' weaknesses, we
propose a practical way to create smooth backdoor triggers without
high-frequency artifacts and study their detectability. We show that existing
defense works can benefit by incorporating these smooth triggers into their
design consideration. Moreover, we show that the detector tuned over stronger
smooth triggers can generalize well to unseen weak smooth triggers. In short,
our work emphasizes the importance of considering frequency analysis when
designing both backdoor attacks and defenses in deep learning.
In text/plain
format
Archived Files and Locations
application/pdf 8.8 MB
file_44ddwtizjvcvfkwfcxcfump5he
|
arxiv.org (repository) web.archive.org (webarchive) |
2104.03413v4
access all versions, variants, and formats of this works (eg, pre-prints)