Comprehensive Online Network Pruning via Learnable Scaling Factors release_xecqx2coxnfa3jd2vwnm5vcofu

by Muhammad Umair Haider, Murtaza Taj

Released as a article .

2020  

Abstract

One of the major challenges in deploying deep neural network architectures is their size which has an adverse effect on their inference time and memory requirements. Deep CNNs can either be pruned width-wise by removing filters based on their importance or depth-wise by removing layers and blocks. Width wise pruning (filter pruning) is commonly performed via learnable gates or switches and sparsity regularizers whereas pruning of layers has so far been performed arbitrarily by manually designing a smaller network usually referred to as a student network. We propose a comprehensive pruning strategy that can perform both width-wise as well as depth-wise pruning. This is achieved by introducing gates at different granularities (neuron, filter, layer, block) which are then controlled via an objective function that simultaneously performs pruning at different granularity during each forward pass. Our approach is applicable to wide-variety of architectures without any constraints on spatial dimensions or connection type (sequential, residual, parallel or inception). Our method has resulted in a compression ratio of 70% to 90% without noticeable loss in accuracy when evaluated on benchmark datasets.
In text/plain format

Archived Files and Locations

application/pdf  344.2 kB
file_5jbnyokvu5b73easvlrxvt2r54
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2020-10-06
Version   v1
Language   en ?
arXiv  2010.02623v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: ce8c3527-d452-46f8-abd0-077369c13c00
API URL: JSON