Interpretable Machine Learning - What’s Possible? What’s Next?

<div id="links">
Banff International Research Station <br/>
Trustworthy Machine Learning<br/>
Slides: <a href="https://go.wisc.edu/5h522y">https://go.wisc.edu/5h522y</a></br>
Code: <a href="https://go.wisc.edu/2m20f9">https://go.wisc.edu/2m20f9</a>
</div>
<br/>
<br/>
<div id="title">
Interpretable Machine Learning <br>

<span style="font-size: 30px">
What's Possible? What's Next?
</span>
</div>
<br/>
<div id="subtitle">
Kris Sankaran <br/>
<a href="https://go.wisc.edu/pgb8n">https://go.wisc.edu/pgb8n</a> <br/>
12 | February | 2023 <br/>
</div>

---

### The Context

Models are woven into the fabric of modern life,

* **Decisions**: They can be automate or assist with judgments that previously would have been done entirely by people.
* **Discovery**: They can orient us within large data catalogs and can guide us towards promising hypotheses.
* **Creativity**: They can make it easier for those without technical training to explore ideas and express themselves.

For these models to be understood and beneficial for people from many
backgrounds, we need interpretability.

---

### What can go wrong?

.center[
<img src="figures/asthma.png" width=1000/>

Example from [1].
]

---

### What can go wrong?

.center[
<img src="figures/adversarial-stopsign.webp" width = 600/>

Example from [2].
]

---

### What can go wrong?

.center[
<img src="figures/bard-hallucination.webp" width=900/>
]

---

# What Makes a Model Interpretable?
<br/>
.center[
<img src="figures/computer.png" width=350 style="position: absolute; left: 500px"/>
]

---

# What Makes a Model Interpretable?
<br/>
.center[
<img src="figures/computer.png" width=350 style="position: absolute; left: 500px"/>
]

<p style="font-size: 30px; position: absolute; left: 20px; top: 200px; width: 450px">
This is a difficult questions....
let's start with an easier one.
</p>

---

# What Makes a Visualization Good?
<br/>
.center[
<img src="figures/visualization.png" width=350 style="position: absolute; left: 450px"/>
]

---

### Key Properties

.pull-left[
A good visualization is:

1. **Legible**: It omits extraneous, distracting elements.
1. **Annotated**: It shows data within the problem context.
1. **Information Dense**: It shows relevant variation efficiently.
]

.pull-right[
<img src="figures/tufte.png" width=330/>
]

---

### Key Properties

A good visualization is:

1. **Legible**: It omits extraneous, distracting elements.
1. **Annotated**: It shows data within the problem context.
1. **Information Dense**: It shows relevant variation efficiently.

.center[
<img src="figures/tufte-2.png"/>
]

---

### Below-the-Surface

More subtly, it should pay attention to:

1. **Data Provenance**: If we don't know the data sources, we should be skeptical or
anything that's shown, no matter how compelling.
1. **Audience**: The effectiveness of a visualization is dependent on the visual
vocabulary of its audience.
1. **Prioritization**: Every design emphasizes some comparisons over others. Are the
"important" patterns visible?
1. **Interactivity**: Does it engage the reader's problem solving capacity?

We should think about model interpretability with the same nuance that we think
about data visualization.

---

## Methods

---

### Vocabulary

1. **Interpretable Model**: A model that, by virtue of its design, is easy for
its stakeholders to accurately describe and alter.
1. **Explainability Technique**: A method that shapes our mental models about
black box systems.

.center[
  <img src="figures/black_box_flashlight.png" width=720/>
]

---

### Vocabulary

1. **Local Explanation**: An artifact for reasoning about individual predictions.

1. **Global Explanation**: An artifact for reasoning about an entire model.

.center[
<img src="figures/explanation_types.png" width=800/>
]

---

### Running Example

Problem: Imagine sampling longitudinal microbiome profiles from 500 study
participants, some of whom eventually developed a disease. Can we discovery any
microbiome-related risk factors?  This simulation is motivated by microbiome studies of HIV risk
[3].

.center[
  <img src="figures/simulated-data.svg" width=830/>
]

---

### Data Organization

We can frame this as a regression problem where all 50 timepoints and 144
species are stacked horizontally.

.center[
  <img src="figures/taxon_regression.png" width=650/>
]

---

### Sparse Logistic Regression

.pull-left[
1. We can reach ~ 77% accuracy using only 38 of the original 7200 features.

1. Each coefficient has a simple species `\(\times\)` time interpretation.
```
# A tibble: 7,201 x 2
   term      estimate
   <chr>        <dbl>
 1 tax13_24     0.593
 2 tax114_26    0.555
 3 tax66_50     0.457
 4 tax105_36    0.289
 5 tax46_30     0.261
 6 tax46_19     0.232
```
]

.pull-right[
<img src="figures/lasso_estimates.svg" width=320/>
]

---

### Directly Interpretable Models

Sparse logistic regression is one example of a directly interpretable model.

1. Parsimony: Predictions can be traced to a few input features, low-order
interactions, or latent factors.
2. Simulatability: Given a new input and a description of the model, a model
user can make a prediction with relatively little effort.

.center[
<img src="figures/parsimony_types.png" width=800/>
]

---

### Instability

Interpretability is a function of the problem context, not just the model.

.pull-left[
1. Troublingly, the output is unstable. We should be skeptical of any interpretations,
regardless of how "interpretable" the model class is.
2. In the simulation, this is a consequence of correlated features -- adjacent
timepoints have similar values.
]

.pull-right[
<img src="figures/lasso_instability.svg"/>
]

---

### Feature Engineering

To address this, we decide to reduce dimensionality by handcrafting some
features: overall slope and curvature for each taxon.

.center[
  <img src="figures/featurizations.png" width=800/>
]

---

### Feature Engineering

.pull-left[
1. The best performing lasso model achieves a performance of ~ 86% using 55 of
the 289 derived features.

1. Lesson: Interpretability and accuracy are not necessarily at odds with one
another.
]

.pull-right[
  <img src="figures/lasso_derived_estimates.svg" width=390/>
]

---

### Transformers

.pull-left[
1. A principle of deep learning is that end-to-end optimization is more general
than expert design.
1. We can apply the GPT2 architecture to our problem, viewing a sequence of
microbiome profiles like a sequence of words.
]

.pull-right[
<img src="figures/transformers-analogy-2.png"/>
]

---

### Transformers

.pull-right[
<img src="figures/transformer_analogy.png"/>
]

---

### Transformers

.pull-left[
Applying a transformer model to the raw series, we reach a hold-out performance
of ~ 84%, which is nearly as good as the lasso with handcrafted features.
]

.pull-right[
<img src="figures/transformer_probs.png"/>
]

---

### Embeddings

In text data, we can understand context-dependent meaning by looking for
clusters in the PCA of embeddings [4].
These represent a type of interaction.
.center[
<img src="figures/bert_context.png" width=670/>
]

---

### Embeddings

We can build the analogous visualization for our microbiome problem. Samples
that are nearby in the embedding space are similar w.r.t. predictive features.

.center[
<img src="figures/pca_comparison.svg" width=1400/>
]

---

### Interpolations

Another common technique is to analyze linear interpolations in this space [5].  This figure traces out the microbiome profiles
between two samples.

.center[
<img src="figures/species_21_interpolation.svg" width=940/>
]

---

### Perturbation

To explain a generic model’s decision on an instance, we can perturb it and see
how the prediction changes.

---

### Integrated Gradients

For example, we can compute the gradient of each class as we perturb a reference
towards a sample of interest.

`\begin{align*}
\left(x_{i} - x_{i}'\right) \int_{\alpha \in \left[0, 1\right]} \frac{\partial f\left(x_{i}' + \alpha\left(x_{i} - x_{i}'\right)\right)}{\partial x_{i}} d\alpha
\end{align*}`

.center[
  <img src="figures/integrated_gradients_animation.gif" width=1200/>
]

---

### Integrated Gradients

In our microbiome example, this can highlight the species and timepoints that
are most responsible for the disease vs. healthy classification of each example.

.center[
<img src="figures/microbiome_integrated_gradients.svg"/>
]

---

### Sanity Checks

Evaluating local explanations is notoriously subjective. Some researchers have
proposed automatic "sanity checks" [6].

.center[
  <img src="figures/sanity_checks.png" width=680/>
]

There have also been theoretical results that identify situations where feature
attribution is unidentified [7].

---

### Concept Bottlenecks

Alternatively, we can explain a decision by reducing the arbitrary feature space
to a set of human-interpretable concepts [8].
This is part of a larger body of work that attempts to establish shared
language/representations for interacting with models.

.center[
<img src="figures/koh_concept.png" width=750 style="position: absolute; top: 340px; left: 300px"/> 
]

---

### Concept Bottlenecks

In the microbiome example, we could define interpretable "concepts" by looking
at the taxa trends for commonly co-varying groups of species.

.center[
<img src="figures/concept_1.svg"/>
]

---

### Concept Bottlenecks

We reconfigure our transformer model to first predict the concept label before
making a final classification.

.center[
<img src="figures/concept_architecture.png"/>
]

---

### Concept Bottlenecks

.pull-left[
Performance is in fact slightly better than before (85%), and we also obtain
concept labels to help us explain each instance's prediction.
]

.pull-right[
<img src="figures/concept_probs.png"/>
]

---

## Challenges

---

### Gauging Progress

1. Interpretability depends on criteria that are difficult to encode in the
standard ML quantitative benchmarks.
1. This is one area where statisticians can excel:
  - We critically interrogate the situations where methods can be applied.
  - We study methods within their problem contexts.

.center[
<img src="figures/future_da.png" width=700/>
]

---

### Human Studies

.pull-left[
1. These studies can quantify how explanations influence human judgment.

1. Common tasks include editing inputs to influence prediction and guessing
  model results from explanations.

1. Good explanations don't necessary improve Human-AI collaboration.
]

.pull-right[
<img src="figures/polyjuice.png"/>

An example task from an interpretability study  [9].
]

---

### Human Studies

.pull-left[
1. These studies can quantify how explanations influence human judgment.

1. Common tasks include editing inputs to influence prediction and guessing
  model results from explanations.

1. Good explanations don't necessary improve Human-AI collaboration.
]

.pull-right[
<img src="figures/complementarity.png"/>

An example of unpredictable effects during deployment [10].
]

---

### Generalist Models

.pull-left[
1. **Modern machine learning models are being designed to solve many problems simultaneously.**

1. Multimodal datasets are becoming the norm, and we need methods to learn from them in an interpretable way.

1. We are also seeing increasingly rich ways to interact with them.
]

.pull-right[
  <img src="figures/generalist_models.png"/>
]

---

### Generalist Models

.pull-left[
1. Modern machine learning models are being designed to solve many problems
simultaneously.

1. **Multimodal datasets are becoming the norm, and we need methods to learn from them in an interpretable way.**

1. We are also seeing increasingly rich ways to interact with them.
]

.pull-right[
  <img src="figures/open_vocabulary.gif"/>
]
---

### Generalist Models

.pull-left[
1. Modern machine learning models are being designed to solve many problems
simultaneously.

1. Multimodal datasets are becoming the norm, and we need methods to learn from
them in an interpretable way.

1. **We are also seeing increasingly rich ways to interact with them.**

]

.pull-right[
  <img src="figures/image_editing.gif"/>
]

---

### Session Preview

Today's sessions will give us more nuanced language for making progress in ML
interpretability,

1. [Cynthia Rudin, Hongtu Zhu, Yuan Ji] Making it easier to specify accurate,
directly interpretable models in challenging scientific, medical, and social
problems.

1. [Hubert Baniecki, Debashis Mondal] Enrich our language for auditing machine
learning models and workflows, from input data to downstream decisions.

---

### References

[1] R. Caruana, Y. Lou, J. Gehrke, et al. "Intelligible Models for
HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission".
In: _Proceedings of the 21th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining_ (2015).

[2] T. Gu, B. Dolan-Gavitt, and S. Garg. "BadNets: Identifying
Vulnerabilities in the Machine Learning Model Supply Chain". In:
_ArXiv_ abs/1708.06733 (2017).

[3] C. Gosmann, M. N. Anahtar, S. A. Handley, et al.
"Lactobacillus<U+2010>Deficient Cervicovaginal Bacterial Communities
Are Associated with Increased HIV Acquisition in Young South African
Women". In: _Immunity_ 46 (2017), p. 29<U+2013>37.

[4] A. Coenen, E. Reif, A. Yuan, et al. "Visualizing and Measuring the
Geometry of BERT". In: _ArXiv_ abs/1906.02715 (2019).

---

### References

[5] Y. Liu, E. Jun, Q. Li, et al. "Latent Space Cartography: Visual
Analysis of Vector Space Embeddings". In: _Computer Graphics Forum_ 38
(2019).

[6] J. Adebayo, J. Gilmer, M. Muelly, et al. "Sanity Checks for
Saliency Maps". In: _Neural Information Processing Systems_. 2018.

[7] B. Bilodeau, N. Jaques, P. W. Koh, et al. "Impossibility Theorems
for Feature Attribution". In: _Proceedings of the National Academy of
Sciences of the United States of America_ 121 2 (2022), p.  e2304406120
.

---

### References

[8] P. W. Koh, T. Nguyen, Y. S. Tang, et al. "Concept Bottleneck
Models". In: _ArXiv_ abs/2007.04612 (2020).

[9] T. S. Wu, M. T. Ribeiro, J. Heer, et al. "Polyjuice: Generating
Counterfactuals for Explaining, Evaluating, and Improving Models". In:
_Annual Meeting of the Association for Computational Linguistics_.
2021.

[10] G. Bansal, T. S. Wu, J. Zhou, et al. "Does the Whole Exceed its
Parts? The Effect of AI Explanations on Complementary Team
Performance". In: _Proceedings of the 2021 CHI Conference on Human
Factors in Computing Systems_ (2020).

---

### References

```
## Warning in `[[.BibEntry`(x, ind): subscript out of bounds
```

---

### Attributions

explainable reinforcement learning by iconpro86 from <a href="https://thenounproject.com/browse/icons/term/explainable-reinforcement-learning/" target="_blank" title="explainable reinforcement learning Icons">Noun Project</a> (CC BY 3.0)

data visualization by Iconiqu from <a href="https://thenounproject.com/browse/icons/term/data-visualization/" target="_blank" title="data visualization Icons">Noun Project</a> (CC BY 3.0)

Ruler by Dhipwise Store from <a href="https://thenounproject.com/browse/icons/term/ruler/" target="_blank" title="Ruler Icons">Noun Project</a> (CC BY 3.0)

bacillus by Cécile Lanza Parker from <a href="https://thenounproject.com/browse/icons/term/bacillus/" target="_blank" title="bacillus Icons">Noun Project</a> (CC BY 3.0)

Microbe by Prettycons from <a href="https://thenounproject.com/browse/icons/term/microbe/" target="_blank" title="Microbe Icons">Noun Project</a> (CC BY 3.0)

bacterium by HideMaru from <a href="https://thenounproject.com/browse/icons/term/bacterium/" target="_blank" title="bacterium Icons">Noun Project</a> (CC BY 3.0)

bacterium by Maria Zamchy from <a href="https://thenounproject.com/browse/icons/term/bacterium/" target="_blank" title="bacterium Icons">Noun Project</a> (CC BY 3.0)

---

### Simulation Mechanism

---

### Historical Context

1. **Initial Wave**: Early ML systems required expert-crafted features. Deep learning
removed this requirement, creating a new need for post-hoc expalnations.

1. **Critical Self-Reflection**: Experiments highlight issues in common assumptions
and commentaries attempt to establish shared vocabulary [11; 6; 12; 13].

1. **Systematic Evaluation**: Systematic progress depends on shared tasks, objective
evaluation, and substantive theory -- these are beginning to emerge.

---

### Roadmap

These techniques are representative of larger classes of techniques for model
interpretability and explainability.

1. Direct interpretability `\(\to\)` Sparse Regression, Featurization
1. Latent representations `\(\to\)` Visualizing Embeddings
1. Local explainability `\(\to\)` Integrated Gradients
1. Shared representations `\(\to\)` Concept Bottleneck

---

### Visualization Metaphor

.pull-left[
1. People from many backgrounds are comfortable reading and creating data
visualizations.

1. Visualization software provide shared representations between computer hardware
and human thought.

1. How will interpretable ML appear in future scientific reports, newspaper
articles, and undergraduate classrooms?
]

.pull-right[
  <img src="figures/nyt-covid.png"/>
]

---