Do Concept Bottleneck Models Learn as Intended?
release_elhnhx3gurelxajivnokjjceve
by
Andrei Margeloiu, Matthew Ashman, Umang Bhatt, Yanzhi Chen, Mateja Jamnik, Adrian Weller
2021
Abstract
Concept bottleneck models map from raw inputs to concepts, and then from
concepts to targets. Such models aim to incorporate pre-specified, high-level
concepts into the learning procedure, and have been motivated to meet three
desiderata: interpretability, predictability, and intervenability. However, we
find that concept bottleneck models struggle to meet these goals. Using post
hoc interpretability methods, we demonstrate that concepts do not correspond to
anything semantically meaningful in input space, thus calling into question the
usefulness of concept bottleneck models in their current form.
In text/plain
format
Archived Files and Locations
application/pdf 5.2 MB
file_a7ndyi2wozcpxaeq2hrdj2t3km
|
arxiv.org (repository) web.archive.org (webarchive) |
2105.04289v1
access all versions, variants, and formats of this works (eg, pre-prints)