Concept Whitening for Interpretable Image Recognition release_g3mopaj5gzfvtm5dpprtdejp5u

by Zhi Chen, Yijie Bei, Cynthia Rudin

Released as a article .

2020  

Abstract

What does a neural network encode about a concept as we traverse through the layers? Interpretability in machine learning is undoubtedly important, but the calculations of neural networks are very challenging to understand. Attempts to see inside their hidden layers can either be misleading, unusable, or rely on the latent space to possess properties that it may not have. In this work, rather than attempting to analyze a neural network posthoc, we introduce a mechanism, called concept whitening (CW), to alter a given layer of the network to allow us to better understand the computation leading up to that layer. When a concept whitening module is added to a CNN, the axes of the latent space are aligned with known concepts of interest. By experiment, we show that CW can provide us a much clearer understanding for how the network gradually learns concepts over layers. CW is an alternative to a batch normalization layer in that it normalizes, and also decorrelates (whitens) the latent space. CW can be used in any layer of the network without hurting predictive performance.
In text/plain format

Archived Files and Locations

application/pdf  21.3 MB
file_ub4vez3r6rhvlkrrr5r7vhfc7i
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2020-10-19
Version   v4
Language   en ?
arXiv  2002.01650v4
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 2607960f-75ad-4ba4-ac07-94ab1ed0f1e9
API URL: JSON