Armour: Generalizable Compact Self-Attention for Vision Transformers
release_lnpxbet5qvbnlnsnzbkdlurb6m
by
Lingchuan Meng
2021
Abstract
Attention-based transformer networks have demonstrated promising potential as
their applications extend from natural language processing to vision. However,
despite the recent improvements, such as sub-quadratic attention approximation
and various training enhancements, the compact vision transformers to date
using the regular attention still fall short in comparison with its convnet
counterparts, in terms of accuracy, model size, and
throughput. This paper introduces a compact self-attention mechanism
that is fundamental and highly generalizable. The proposed method reduces
redundancy and improves efficiency on top of the existing attention
optimizations. We show its drop-in applicability for both the regular attention
mechanism and some most recent variants in vision transformers. As a result, we
produced smaller and faster models with the same or better accuracies.
In text/plain
format
Archived Files and Locations
application/pdf 394.0 kB
file_hy2ljslsgvbsvku3fezdnovzqq
|
arxiv.org (repository) web.archive.org (webarchive) |
2108.01778v1
access all versions, variants, and formats of this works (eg, pre-prints)