Enhancing Transformation-based Defenses using a Distribution Classifier
release_cbn3qnrdungodl74tvedgtloz4
by
Connie Kou, Hwee Kuan Lee, Ee-Chien Chang, Teck Khim Ng
2020
Abstract
Adversarial attacks on convolutional neural networks (CNN) have gained
significant attention and there have been active research efforts on defense
mechanisms. Stochastic input transformation methods have been proposed, where
the idea is to recover the image from adversarial attack by random
transformation, and to take the majority vote as consensus among the random
samples. However, the transformation improves the accuracy on adversarial
images at the expense of the accuracy on clean images. While it is intuitive
that the accuracy on clean images would deteriorate, the exact mechanism in
which how this occurs is unclear. In this paper, we study the distribution of
softmax induced by stochastic transformations. We observe that with random
transformations on the clean images, although the mass of the softmax
distribution could shift to the wrong class, the resulting distribution of
softmax could be used to correct the prediction. Furthermore, on the
adversarial counterparts, with the image transformation, the resulting shapes
of the distribution of softmax are similar to the distributions from the clean
images. With these observations, we propose a method to improve existing
transformation-based defenses. We train a separate lightweight distribution
classifier to recognize distinct features in the distributions of softmax
outputs of transformed images. Our empirical studies show that our distribution
classifier, by training on distributions obtained from clean images only,
outperforms majority voting for both clean and adversarial images. Our method
is generic and can be integrated with existing transformation-based defenses.
In text/plain
format
Archived Files and Locations
application/pdf 4.1 MB
file_g2ysy3zuofggfge53jgvsrmiei
|
arxiv.org (repository) web.archive.org (webarchive) |
1906.00258v2
access all versions, variants, and formats of this works (eg, pre-prints)