Armour: Generalizable Compact Self-Attention for Vision Transformers release_lnpxbet5qvbnlnsnzbkdlurb6m

by Lingchuan Meng

Released as a article .

2021  

Abstract

Attention-based transformer networks have demonstrated promising potential as their applications extend from natural language processing to vision. However, despite the recent improvements, such as sub-quadratic attention approximation and various training enhancements, the compact vision transformers to date using the regular attention still fall short in comparison with its convnet counterparts, in terms of accuracy, model size, and throughput. This paper introduces a compact self-attention mechanism that is fundamental and highly generalizable. The proposed method reduces redundancy and improves efficiency on top of the existing attention optimizations. We show its drop-in applicability for both the regular attention mechanism and some most recent variants in vision transformers. As a result, we produced smaller and faster models with the same or better accuracies.
In text/plain format

Archived Files and Locations

application/pdf  394.0 kB
file_hy2ljslsgvbsvku3fezdnovzqq
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2021-08-03
Version   v1
Language   en ?
arXiv  2108.01778v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 811ba45d-6479-4e58-99a3-17318bee75a4
API URL: JSON