Abstraction, Validation, and Generalization for Explainable Artificial Intelligence
release_mnomzpwwkjbthlcl25xxw72koq
by
Scott Cheng-Hsin Yang, Tomas Folke, Patrick Shafto
2021
Abstract
Neural network architectures are achieving superhuman performance on an
expanding range of tasks. To effectively and safely deploy these systems, their
decision-making must be understandable to a wide range of stakeholders. Methods
to explain AI have been proposed to answer this challenge, but a lack of theory
impedes the development of systematic abstractions which are necessary for
cumulative knowledge gains. We propose Bayesian Teaching as a framework for
unifying explainable AI (XAI) by integrating machine learning and human
learning. Bayesian Teaching formalizes explanation as a communication act of an
explainer to shift the beliefs of an explainee. This formalization decomposes
any XAI method into four components: (1) the inference to be explained, (2) the
explanatory medium, (3) the explainee model, and (4) the explainer model. The
abstraction afforded by Bayesian Teaching to decompose any XAI method
elucidates the invariances among them. The decomposition of XAI systems enables
modular validation, as each of the first three components listed can be tested
semi-independently. This decomposition also promotes generalization through
recombination of components from different XAI systems, which facilitates the
generation of novel variants. These new variants need not be evaluated one by
one provided that each component has been validated, leading to an exponential
decrease in development time. Finally, by making the goal of explanation
explicit, Bayesian Teaching helps developers to assess how suitable an XAI
system is for its intended real-world use case. Thus, Bayesian Teaching
provides a theoretical framework that encourages systematic, scientific
investigation of XAI.
In text/plain
format
Archived Files and Locations
application/pdf 187.4 kB
file_eug6xz6hgrh6poupnecahl46xe
|
arxiv.org (repository) web.archive.org (webarchive) |
2105.07508v1
access all versions, variants, and formats of this works (eg, pre-prints)