Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation
release_ivxv3m4movektcyiqsekuah4rm
by
Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, Alan Yuille,
Liang-Chieh Chen
2020
Abstract
Convolution exploits locality for efficiency at a cost of missing long range
context. Self-attention has been adopted to augment CNNs with non-local
interactions. Recent works prove it possible to stack self-attention layers to
obtain a fully attentional network by restricting the attention to a local
region. In this paper, we attempt to remove this constraint by factorizing 2D
self-attention into two 1D self-attentions. This reduces computation complexity
and allows performing attention within a larger or even global region. In
companion, we also propose a position-sensitive self-attention design.
Combining both yields our position-sensitive axial-attention layer, a novel
building block that one could stack to form axial-attention models for image
classification and dense prediction. We demonstrate the effectiveness of our
model on four large-scale datasets. In particular, our model outperforms all
existing stand-alone self-attention models on ImageNet. Our Axial-DeepLab
improves 2.8% PQ over bottom-up state-of-the-art on COCO test-dev. This
previous state-of-the-art is attained by our small variant that is 3.8x
parameter-efficient and 27x computation-efficient. Axial-DeepLab also achieves
state-of-the-art results on Mapillary Vistas and Cityscapes.
In text/plain
format
Archived Files and Locations
application/pdf 6.0 MB
file_alvnwjpybfflxjzxxewvkporyy
|
arxiv.org (repository) web.archive.org (webarchive) |
2003.07853v1
access all versions, variants, and formats of this works (eg, pre-prints)