Interactivity for Interpretable Machine Learning

<div id="title">
Interactivity for Interpretable Machine Learning<br/>
</div>
<div id="under_title">
Computational Genomics Summer Institute 2024<br/>
</div>

<div id="subtitle">
Kris Sankaran <br/>
UW - Madison <br/>
11 | July | 2024 <br/>
Lab: <a href="https://go.wisc.edu/pgb8nl">go.wisc.edu/pgb8nl</a> <br/>
</div>

<div id="subtitle_right">
<a href="https://go.wisc.edu/pl9a65">Slides</a><br/>

<img src="figures/slides-link.png" width=100/>
</div>

---

### Learning Objectives

By the end of this tutorial, you will be able to:

1. Compare and contrast outputs from different interpretability techniques.

1. Apply interactive computing ideas to the research code that you develop.

---

### Motivation: Outpatient Care for Pneumonia

<span style="font-size: 18px;">
Example from [1].
</span>
]

---

### Motivation

1. Computers let us solve problems that would be impossible to manage any other
way, but we need some way of **checking our work**, especially when the stakes
are high.

1. We can often **improve our models** by looking more closely at what they learn
and intervening as necessary.

1. In the long-run, we'll be able to **get more out of our data and models** if we
look more critically at them.

---

---

### What is Interpretability?

Models with these properties tend to be more interpretable [2; 3; 4; 5]:

<img src="figures/simplicity.png" width=50/> **Parsimony**: The model has relatively few components. <br/><br/>

<img src="figures/crystal-ball.png" width=40/> **Simulatability**: Users can predict model behavior on new samples. <br/><br/>

<img src="figures/Lego_Brick_4.svg" width=50/> **Modularity**: The model can be broken into simpler components.

---

### Distinctions

1. **Interpretable Model**: A model that, by virtue of its design, is easy to
accurately describe and edit.
1. **Explainability Technique**: A method that summarizes some aspect of a black
box system.

---

### Illustrative Example

Imagine sampling microbiome profiles over time and seeing that some of the hosts
develop a disease. Can we discover risk factors from these data?

.center[
  <img src="figures/simulated-data.svg" width=830/>
]
<span style="font-size: 18px;">
This simulation is motivated by microbiome studies of HIV risk over time [6].
</span>

---

### Transformers

.pull-left[
1. We apply the GPT2 architecture to our problem, viewing a sequence of
microbiome profiles like a sequence of words.
1. The hope is that this model automatically discovers temporal and taxonomic
features that distinguish the classes.
]

---

### Transformers

Transformer: 83.2%<br/>
Lasso: 78%<br/>
Lasso (True Features): 87.8%
]

---

### Representation Learning

How can we try to understand the transformer embeddings `\(\tilde{\mathbf{z}}_{i}\)`?
<br/>
<br/>
.center[
<img src="figures/transformer_layers.png" width=900/>
]

---

### Representation Learning

We can use PCA to explore the embedding space [7; 8].
<br/>
<br/>
.center[
<img src="figures/transformer_layers_pca.png" width=650/>
]

---

### Embeddings

We can interpolate between samples to understand the resulting embedding space.

---

### Perturbation

To understand predictions at individual samples, we can use *local*
explanations. These apply a perturbation and quantify the results.
<br/>
<br/>

.center[
<img src="figures/perturbation.png" width=500/>
]
Examples: LIME [9], integrated gradients
[10], GradCAM [11],
local variable importance [12].

---

### Integrated Gradients

Integrated gradients are better than ordinary gradients because they are less
sensitive to saturation in the usual logistic loss.

`\begin{align*}
\left(x_{i} - x_{i}'\right) \int_{\alpha \in \left[0, 1\right]} \frac{\partial f\left(x_{i}' + \alpha\left(x_{i} - x_{i}'\right)\right)}{\partial x_{i}} d\alpha
\end{align*}`

.center[
  <img src="figures/integrated_gradients_animation.gif" width=750/><br/>
<span style="font-size: 18px;">
Interactive visualization from [13].
</span>
]

---

### Integrated Gradients

In our microbiome example, this highlights the species and timepoints that are
most responsible for the disease vs. healthy classification for each sample.

---

### Concept Bottlenecks

Alternatively, we can explain a decision by reducing the arbitrary feature space
to a set of human-interpretable concepts [14].  This is part
of a larger body of work that attempts to establish shared
language/representations for interacting with models 
[15].

.center[
<span style="font-size: 18px;">
<img src="figures/koh_concept.png" width=750/> <br/>
Figure from [14].
</span>
]

---

### Concept Bottlenecks

.center[
<span style="font-size: 18px;">
<img src="figures/concept_architecture-2.png" width=600/>
</span>
]

---

### Concept Bottlenecks

In the microbiome example, we could define concepts like blooms or trends. These
would have to be manually annotated in the original training data. The test set
performance is comparable to before (84%).

---

---

### What is Interactivity?

Interactivity allows us to specify what computation we want done without having
to write code.

.pull-left[
<img src="figures/autocomplete.gif" width=300/><br/>
<span style="font-size: 18px;">
Text entry is a type of interaction.
</span>
]

.pull-right[
<img src="figures/delegate-calc-1.gif"/>
<span style="font-size: 18px;">
An interactive delegate calculator created by the NYT [16].
</span>
]

---

### <span style="col: #D93611;">Focus-plus-Context</span>

We can let readers zoom into patterns of interest without losing relevant
context. This has a flavor of "analyzing the residuals."
<br/>
<br/>

-- Schneiderman's "Visual Information Seeking Mantra" [17].
]

.pull-right[
.center[<img src="figures/structuration.png" width=250/>]
<span style="font-size: 18px;">
Tukey's iterative structuration, as interpreted by [18].
</span>
]

---

### Example: Tree Navigation

Large trees can be difficult to explore. Focus-plus-context gives a natural way
of navigating them, updating the view according to user interactions  [19; 20].

.center[
<iframe src="https://krisrs1128.github.io/treelapse/pages/antibiotic.html#htmlwidget-dd8d9e7ec77f2a8cc333" width=900 height=380></iframe>
]

---

### Example: Dimensionality Reduction

We can use focus-plus-context to compare topics across models with different
complexity [21; 22]. Low `\(K\)` gives
context, large `\(K\)` gives focus.

---

### Linked Views

1. Notice that we're defining queries graphically, not just through selection menus.
]

.pull-right[
<iframe src="https://connect.doit.wisc.edu/content/6df2063a-1c7d-4f01-b98c-3aebed82d190/" allowfullscreen="" data-external="1" height=500 width=600></iframe>
]

---

### Example: Model Evaluation

We can use these ideas to evaluate the quality of a single-cell simulator.

---

### Software

1. Captum (Python)
1. DALEX (R)
1. imodels (Python)
1. interpretml (Python)

Review Paper<br/> - [Code Repository](https://go.wisc.edu/3k1ewe)

]

1. Shiny (R/Python)
1. Vega/Vega-lite (JS, Python, R)
1. D3 (JS)
1. p5 (JS)
1. Jupyter Widgets (Python)

Visualization Course<br/> - Notes [I](https://krisrs1128.github.io/stat679_notes/), [II](https://krisrs1128.github.io/stat436_s24/)<br/>- Recordings [I](https://www.youtube.com/watch?v=cc__b5R6OzA&list=PLhax_7Mawcfk1GEl_vOg7cE_vtRTsqMWw&pp=gAQBiAQB), [II](https://mediaspace.wisc.edu/channel/STAT+479%3A+Statistical+Data+Visualization/197911113)
]

---

---

### AI and IA - Design Space

Computers are good at accurately reaching well-defined goals, but people are
good at criticism and planning. How can we get the best of both worlds?

**Artificial Intelligence (AI)**: Solve problems without human intervention.<br/>
**Intelligence Augmentation (IA)**: Enhance human problem solving ability.
<br/>
<br/>
.center[
<img src="figures/dm_vs_ai.png" width=800/>
]

---

### Guide-Decide Loop

1. In an interactive setting, AI can make suggestions for the next step in an
analysis [25; 26].

1. This depends on their being a shared representation that links the frontend
(for human interpretation) with the backend (for model prediction).

---

### Interactive Translation

A well-designed interface helps professional translators achieve better results
than simply editing the output of an automated translation system
[27; 28; 29].

Predictions can be updated in response to interactions. The shared
representation here is natural language.

---

### Conclusion

1. Interpretability and interactivity can simplify the process of specifying and
validating models.

1. Good research questions arise from carefully looking at your data and models,
and these tools can help.

---

### Thank you!

* Lab Members: Margaret Thairu, Hanying Jiang, Shuchen Yan, Yuliang Peng, Kaiyan Ma, Kai Cui, and Kobe Uko.
* Funding: NIGMS R01GM152744.

---

<h3 style="font-size: 22px;">References</h3>
[1] R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad.
"Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day
Readmission". In: _Proceedings of the 21th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining_ (2015).

[2] Z. C. Lipton. "The Mythos of Model Interpretability". En. In: _ACM Queue_ 16.3
(Jun. 2018), pp. 31-57.

[3] W. J. Murdoch, C. Singh, K. Kumbier, R. Abbasi-Asl, and B. Yu. "Definitions,
methods, and applications in interpretable machine learning". En. In: _Proc. Natl.
Acad. Sci. U. S. A._ 116.44 (Oct. 2019), pp. 22071-22080.

[4] F. Doshi-Velez and B. Kim. "Towards A rigorous science of interpretable
machine learning". In: _arXiv_ (2017).

[5] K. Sankaran. "Data Science Principles for Interpretable and Explainable AI".
In: _arXiv_ (2024).

[6] C. Gosmann, M. N. Anahtar, S. A. Handley, M. Farcasanu, G. Abu-Ali, B. A.
Bowman, N. Padavattan, C. Desai, et al. "Lactobacillus-deficient cervicovaginal
bacterial communities are associated with increased HIV acquisition in young south
African women". En. In: _Immunity_ 46.1 (Jan. 2017), pp. 29-37.

[7] A. Coenen, E. Reif, A. Yuan, B. Kim, A. Pearce, F. Viégas, and M. Wattenberg.
"Visualizing and measuring the geometry of BERT". In: _Proceedings of the 33rd
International Conference on Neural Information Processing Systems_. Red Hook, NY,
USA: Curran Associates Inc., 2019.

[8] D. Erhan, Y. Bengio, A. Courville, P. Manzagol, P. Vincent, and S. Bengio.
"Why Does Unsupervised Pre-training Help Deep Learning?" In: _Journal of Machine
Learning Research_ 11.19 (2010), pp. 625-660. URL:
[http://jmlr.org/papers/v11/erhan10a.html](http://jmlr.org/papers/v11/erhan10a.html).

[9] M. T. Ribeiro, S. Singh, and C. Guestrin. "``Why Should I Trust You?'':
Explaining the Predictions of Any Classifier". In: _Proceedings of the 22nd ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining_. KDD '16
(2016). DOI: [10.1145/2939672.2939778](https://doi.org/10.1145%2F2939672.2939778).
URL:
[https://doi.org/10.1145/2939672.2939778](https://doi.org/10.1145/2939672.2939778).

[10] M. Sundararajan, A. Taly, and Q. Yan. "Axiomatic attribution for deep
networks". In: _Proceedings of the 34th International Conference on Machine
Learning - Volume 70_. ICML'17. Sydney, NSW, Australia: JMLR.org, 2017, p.
3319–3328.

[11] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra.
"Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based
Localization". In: _2017 IEEE International Conference on Computer Vision (ICCV)_.
2017, pp. 618-626. DOI:
[10.1109/ICCV.2017.74](https://doi.org/10.1109%2FICCV.2017.74).

[12] A. Agarwal, A. M. Kenney, Y. S. Tan, T. M. Tang, and B. Yu. "MDI+: A flexible
random forest-based feature importance framework". In: _arXiv_ (2023).

[13] P. Sturmfels, S. Lundberg, and S. Lee. "Visualizing the Impact of Feature
Attribution Baselines". In: _Distill_ 5.1 (Jan. 2020). ISSN: 2476-0757. DOI:
[10.23915/distill.00022](https://doi.org/10.23915%2Fdistill.00022). URL:
[http://dx.doi.org/10.23915/distill.00022](http://dx.doi.org/10.23915/distill.00022).

[14] P. W. Koh, T. Nguyen, Y. S. Tang, S. Mussmann, E. Pierson, B. Kim, and P.
Liang. "Concept Bottleneck Models". In: _Proceedings of the 33rd International
Conference on Neural Information Processing Systems_. Red Hook, NY, USA: Curran
Associates Inc., 2020.

---

[15] M. Yuksekgonul, M. Wang, and J. Zou. "Post-hoc Concept Bottleneck Models".
In: _The Eleventh International Conference on Learning Representations _. 2023.
URL:
[https://openreview.net/forum?id=nA5AZ8CEyow](https://openreview.net/forum?id=nA5AZ8CEyow).

[16] G. Aisch. _In Defense of Interactive Graphics - vis4.net_.
<https://www.vis4.net/blog/in-defense-of-interactive-graphics/>. [Accessed
08-07-2024].

[17] B. Shneiderman. "The eyes have it: a task by data type taxonomy for
information visualizations". In: _Proceedings 1996 IEEE Symposium on Visual
Languages_. Boulder, CO, USA: IEEE Comput. Soc. Press, 2002.

[18] S. Holmes. "Comment on 'A model for studying display methods of statistical
graphics'". In: _J. Comput. Graph. Stat._ 2.4 (Dec. 1993), pp. 349-353.

[19] J. Heer and S. K. Card. "DOITrees Revisited: Scalable, Space-Constrained
Visualization of Hierarchical Data". In: _Advanced Visual Interfaces_ (2004), pp.
421-424. URL:
[http://vis.stanford.edu/papers/doitrees-revisited](http://vis.stanford.edu/papers/doitrees-revisited).

[20] K. Sankaran and S. Holmes. "Interactive Visualization of Hierarchically
Structured Data". En. In: _Journal of Computational and Graphical Statistics_ 27.3
(Jul. 2018), pp. 553-563. ISSN: 1061-8600, 1537-2715. DOI:
[10.1080/10618600.2017.1392866](https://doi.org/10.1080%2F10618600.2017.1392866).

[21] L. Symul, P. Jeganathan, E. K. Costello, M. France, S. M. Bloom, D. S. Kwon,
J. Ravel, D. A. Relman, et al. "Sub-communities of the vaginal microbiota in
pregnant and non-pregnant women". En. In: _Proc. Biol. Sci._ 290.2011 (Nov. 2023),
p. 20231461.

[22] J. Fukuyama, K. Sankaran, and L. Symul. "Multiscale analysis of count data
through topic alignment". En. In: _Biostatistics_ 24.4 (Oct. 2023), pp. 1045-1065.

[23] R. A. Becker and W. S. Cleveland. "Brushing Scatterplots". In:
_Technometrics_ 29.2 (May. 1987), p. 127.

[24] A. Buja, D. Cook, and D. F. Swayne. "Interactive high-dimensional data
visualization". En. In: _J. Comput. Graph. Stat._ 5.1 (Mar. 1996), pp. 78-99.

[25] J. Heer, J. Hellerstein, and S. Kandel. "Predictive Interaction for Data
Transformation". In: _Conference on Innovative Data Systems Research (CIDR)_
(2015). URL:
[https://idl.uw.edu/papers/predictive-interaction](https://idl.uw.edu/papers/predictive-interaction).

[26] J. Heer. "Agency plus automation: Designing artificial intelligence into
interactive systems". En. In: _Proc. Natl. Acad. Sci. U. S. A._ 116.6 (Feb. 2019),
pp. 1844-1850.

[27] S. Green, J. Chuang, J. Heer, and C. D. Manning. "Predictive translation
memory: a mixed-initiative system for human language translation". In:
_Proceedings of the 27th Annual ACM Symposium on User Interface Software and
Technology_. UIST '14. Honolulu, Hawaii, USA: Association for Computing Machinery,
2014, p. 177–187. ISBN: 9781450330695. DOI:
[10.1145/2642918.2647408](https://doi.org/10.1145%2F2642918.2647408).

[28] S. Green, S. I. Wang, J. Chuang, J. Heer, S. Schuster, and C. D. Manning.
"Human Effort and Machine Learnability in Computer Aided Translation". In:
_Proceedings of the 2014 Conference on Empirical Methods in Natural Language
Processing (EMNLP)_. Ed. by A. Moschitti, B. Pang and W. Daelemans. Doha, Qatar,
Oct. 2014, pp. 1225-1236. DOI:
[10.3115/v1/D14-1130](https://doi.org/10.3115%2Fv1%2FD14-1130).

[29] S. Green, J. Heer, and C. D. Manning. "The efficacy of human post-editing for
language translation". In: _Proceedings of the SIGCHI Conference on Human Factors
in Computing Systems_. CHI '13. Paris, France: Association for Computing
Machinery, 2013, p. 439–448. ISBN: 9781450318990. DOI:
[10.1145/2470654.2470718](https://doi.org/10.1145%2F2470654.2470718). URL:
[https://doi.org/10.1145/2470654.2470718](https://doi.org/10.1145/2470654.2470718).

---

[30] D. Kundaliya. "Computing - Incisive Media: Google AI chatbot Bard gives wrong
answer in its first demo". In: _Computing_ (Sep. 2023). URL:
[https://ezproxy.library.wisc.edu/login?url=https://www.proquest.com/trade-journals/computing-incisive-media-google-ai-chatbot-bard/docview/2774438186/se-2](https://ezproxy.library.wisc.edu/login?url=https://www.proquest.com/trade-journals/computing-incisive-media-google-ai-chatbot-bard/docview/2774438186/se-2).

[31] M. G. Chevrette, C. S. Thomas, A. Hurley, N. Rosario-Meléndez, K. Sankaran,
Y. Tu, A. Hall, S. Magesh, et al. "Microbiome composition modulates secondary
metabolism in a multispecies bacterial community". En. In: _Proc. Natl. Acad. Sci.
U. S. A._ 119.42 (Oct. 2022), p. e2212930119.

---

### Motivation: The Google Bard Demo

See the discussion in [30].
]

---

### Example: Multiple Testing

Linked views can help make sense of a collection of hypothesis tests. Each
letter corresponds to an experimental factor (details in [31]).

.center[
<iframe src="https://connect.doit.wisc.edu/content/7d109162-8690-4c84-8563-4bdee8f15ca0" width=1400 height=600></iframe>
]

---

### Figure Attribution

Simplicity by M. Oki Orlando from <a href="https://thenounproject.com/browse/icons/term/simplicity/" target="_blank" title="Simplicity Icons">Noun Project</a> (CC BY 3.0)

Crystal Ball by Kiki Rizky from <a href="https://thenounproject.com/browse/icons/term/crystal-ball/" target="_blank" title="Crystal Ball Icons">Noun Project</a> (CC BY 3.0)