Do Concept Bottleneck Models Learn as Intended? release_elhnhx3gurelxajivnokjjceve

by Andrei Margeloiu, Matthew Ashman, Umang Bhatt, Yanzhi Chen, Mateja Jamnik, Adrian Weller

Released as a article .

2021  

Abstract

Concept bottleneck models map from raw inputs to concepts, and then from concepts to targets. Such models aim to incorporate pre-specified, high-level concepts into the learning procedure, and have been motivated to meet three desiderata: interpretability, predictability, and intervenability. However, we find that concept bottleneck models struggle to meet these goals. Using post hoc interpretability methods, we demonstrate that concepts do not correspond to anything semantically meaningful in input space, thus calling into question the usefulness of concept bottleneck models in their current form.
In text/plain format

Archived Files and Locations

application/pdf  5.2 MB
file_a7ndyi2wozcpxaeq2hrdj2t3km
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2021-05-10
Version   v1
Language   en ?
arXiv  2105.04289v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: d1d265fe-c489-4750-ba1f-93b5fb62ef38
API URL: JSON