Minimax Defense against Gradient-based Adversarial Attacks
release_ivus7rlt4na3nb6rvrxth5mvle
by
Blerta Lindqvist, Rauf Izmailov
2020
Abstract
State-of-the-art adversarial attacks are aimed at neural network classifiers.
By default, neural networks use gradient descent to minimize their loss
function. The gradient of a classifier's loss function is used by
gradient-based adversarial attacks to generate adversarially perturbed images.
We pose the question whether another type of optimization could give neural
network classifiers an edge. Here, we introduce a novel approach that uses
minimax optimization to foil gradient-based adversarial attacks. Our minimax
classifier is the discriminator of a generative adversarial network (GAN) that
plays a minimax game with the GAN generator. In addition, our GAN generator
projects all points onto a manifold that is different from the original
manifold since the original manifold might be the cause of adversarial attacks.
To measure the performance of our minimax defense, we use adversarial attacks -
Carlini Wagner (CW), DeepFool, Fast Gradient Sign Method (FGSM) - on three
datasets: MNIST, CIFAR-10 and German Traffic Sign (TRAFFIC). Against CW
attacks, our minimax defense achieves 98.07% (MNIST-default 98.93%), 73.90%
(CIFAR-10-default 83.14%) and 94.54% (TRAFFIC-default 96.97%). Against DeepFool
attacks, our minimax defense achieves 98.87% (MNIST), 76.61% (CIFAR-10) and
94.57% (TRAFFIC). Against FGSM attacks, we achieve 97.01% (MNIST), 76.79%
(CIFAR-10) and 81.41% (TRAFFIC). Our Minimax adversarial approach presents a
significant shift in defense strategy for neural network classifiers.
In text/plain
format
Archived Files and Locations
application/pdf 771.9 kB
file_rbgpohfpmbg7pb3qapyiklurpm
|
arxiv.org (repository) web.archive.org (webarchive) |
2002.01256v1
access all versions, variants, and formats of this works (eg, pre-prints)