Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks
release_jqtq6crurzcevcczzxxjgnnoh4
by
Yongming Rao, Zuyan Liu, Wenliang Zhao, Jie Zhou, Jiwen Lu
2022
Abstract
In this paper, we present a new approach for model acceleration by exploiting
spatial sparsity in visual data. We observe that the final prediction in vision
Transformers is only based on a subset of the most informative tokens, which is
sufficient for accurate image recognition. Based on this observation, we
propose a dynamic token sparsification framework to prune redundant tokens
progressively and dynamically based on the input to accelerate vision
Transformers. Specifically, we devise a lightweight prediction module to
estimate the importance score of each token given the current features. The
module is added to different layers to prune redundant tokens hierarchically.
While the framework is inspired by our observation of the sparse attention in
vision Transformers, we find the idea of adaptive and asymmetric computation
can be a general solution for accelerating various architectures. We extend our
method to hierarchical models including CNNs and hierarchical vision
Transformers as well as more complex dense prediction tasks that require
structured feature maps by formulating a more generic dynamic spatial
sparsification framework with progressive sparsification and asymmetric
computation for different spatial locations. By applying lightweight fast paths
to less informative features and using more expressive slow paths to more
important locations, we can maintain the structure of feature maps while
significantly reducing the overall computations. Extensive experiments
demonstrate the effectiveness of our framework on various modern architectures
and different visual recognition tasks. Our results clearly demonstrate that
dynamic spatial sparsification offers a new and more effective dimension for
model acceleration. Code is available at
https://github.com/raoyongming/DynamicViT
In text/plain
format
Archived Files and Locations
application/pdf 9.0 MB
file_sttemxlbsvhp3clgd24f4zd2ui
|
arxiv.org (repository) web.archive.org (webarchive) |
2207.01580v1
access all versions, variants, and formats of this works (eg, pre-prints)