<div id="links"> Banff International Research Station <br/> Trustworthy Machine Learning<br/> Slides: <a href="https://go.wisc.edu/5h522y">https://go.wisc.edu/5h522y</a></br> Code: <a href="https://go.wisc.edu/2m20f9">https://go.wisc.edu/2m20f9</a> </div> <br/> <br/> <div id="title"> Interpretable Machine Learning <br> <span style="font-size: 30px"> What's Possible? What's Next? </span> </div> <br/> <div id="subtitle"> Kris Sankaran <br/> <a href="https://go.wisc.edu/pgb8n">https://go.wisc.edu/pgb8n</a> <br/> 12 | February | 2023 <br/> </div> --- ### The Context Models are woven into the fabric of modern life, * **Decisions**: They can be automate or assist with judgments that previously would have been done entirely by people. * **Discovery**: They can orient us within large data catalogs and can guide us towards promising hypotheses. * **Creativity**: They can make it easier for those without technical training to explore ideas and express themselves. For these models to be understood and beneficial for people from many backgrounds, we need interpretability. --- ### What can go wrong? .center[ <img src="figures/asthma.png" width=1000/> Example from [1]. ] --- ### What can go wrong? .center[ <img src="figures/adversarial-stopsign.webp" width = 600/> Example from [2]. ] --- ### What can go wrong? .center[ <img src="figures/bard-hallucination.webp" width=900/> ] --- # What Makes a Model Interpretable? <br/> .center[ <img src="figures/computer.png" width=350 style="position: absolute; left: 500px"/> ] --- # What Makes a Model Interpretable? <br/> .center[ <img src="figures/computer.png" width=350 style="position: absolute; left: 500px"/> ] <p style="font-size: 30px; position: absolute; left: 20px; top: 200px; width: 450px"> This is a difficult questions.... let's start with an easier one. </p> --- # What Makes a Visualization Good? <br/> .center[ <img src="figures/visualization.png" width=350 style="position: absolute; left: 450px"/> ] --- ### Key Properties .pull-left[ A good visualization is: 1. **Legible**: It omits extraneous, distracting elements. 1. **Annotated**: It shows data within the problem context. 1. **Information Dense**: It shows relevant variation efficiently. ] .pull-right[ <img src="figures/tufte.png" width=330/> ] --- ### Key Properties A good visualization is: 1. **Legible**: It omits extraneous, distracting elements. 1. **Annotated**: It shows data within the problem context. 1. **Information Dense**: It shows relevant variation efficiently. .center[ <img src="figures/tufte-2.png"/> ] --- ### Below-the-Surface More subtly, it should pay attention to: 1. **Data Provenance**: If we don't know the data sources, we should be skeptical or anything that's shown, no matter how compelling. 1. **Audience**: The effectiveness of a visualization is dependent on the visual vocabulary of its audience. 1. **Prioritization**: Every design emphasizes some comparisons over others. Are the "important" patterns visible? 1. **Interactivity**: Does it engage the reader's problem solving capacity? We should think about model interpretability with the same nuance that we think about data visualization. --- ## Methods --- ### Vocabulary 1. **Interpretable Model**: A model that, by virtue of its design, is easy for its stakeholders to accurately describe and alter. 1. **Explainability Technique**: A method that shapes our mental models about black box systems. .center[ <img src="figures/black_box_flashlight.png" width=720/> ] --- ### Vocabulary 1. **Local Explanation**: An artifact for reasoning about individual predictions. 1. **Global Explanation**: An artifact for reasoning about an entire model. .center[ <img src="figures/explanation_types.png" width=800/> ] --- ### Running Example Problem: Imagine sampling longitudinal microbiome profiles from 500 study participants, some of whom eventually developed a disease. Can we discovery any microbiome-related risk factors? This simulation is motivated by microbiome studies of HIV risk [3]. .center[ <img src="figures/simulated-data.svg" width=830/> ] --- ### Data Organization We can frame this as a regression problem where all 50 timepoints and 144 species are stacked horizontally. .center[ <img src="figures/taxon_regression.png" width=650/> ] --- ### Sparse Logistic Regression .pull-left[ 1. We can reach ~ 77% accuracy using only 38 of the original 7200 features. 1. Each coefficient has a simple species `\(\times\)` time interpretation. ``` # A tibble: 7,201 x 2 term estimate <chr> <dbl> 1 tax13_24 0.593 2 tax114_26 0.555 3 tax66_50 0.457 4 tax105_36 0.289 5 tax46_30 0.261 6 tax46_19 0.232 ``` ] .pull-right[ <img src="figures/lasso_estimates.svg" width=320/> ] --- ### Directly Interpretable Models Sparse logistic regression is one example of a directly interpretable model. 1. Parsimony: Predictions can be traced to a few input features, low-order interactions, or latent factors. 2. Simulatability: Given a new input and a description of the model, a model user can make a prediction with relatively little effort. .center[ <img src="figures/parsimony_types.png" width=800/> ] --- ### Instability Interpretability is a function of the problem context, not just the model. .pull-left[ 1. Troublingly, the output is unstable. We should be skeptical of any interpretations, regardless of how "interpretable" the model class is. 2. In the simulation, this is a consequence of correlated features -- adjacent timepoints have similar values. ] .pull-right[ <img src="figures/lasso_instability.svg"/> ] --- ### Feature Engineering To address this, we decide to reduce dimensionality by handcrafting some features: overall slope and curvature for each taxon. .center[ <img src="figures/featurizations.png" width=800/> ] --- ### Feature Engineering .pull-left[ 1. The best performing lasso model achieves a performance of ~ 86% using 55 of the 289 derived features. 1. Lesson: Interpretability and accuracy are not necessarily at odds with one another. ] .pull-right[ <img src="figures/lasso_derived_estimates.svg" width=390/> ] --- ### Transformers .pull-left[ 1. A principle of deep learning is that end-to-end optimization is more general than expert design. 1. We can apply the GPT2 architecture to our problem, viewing a sequence of microbiome profiles like a sequence of words. ] .pull-right[ <img src="figures/transformers-analogy-2.png"/> ] --- ### Transformers .pull-left[ 1. A principle of deep learning is that end-to-end optimization is more general than expert design. 1. We can apply the GPT2 architecture to our problem, viewing a sequence of microbiome profiles like a sequence of words. ] .pull-right[ <img src="figures/transformer_analogy.png"/> ] --- ### Transformers .pull-left[ Applying a transformer model to the raw series, we reach a hold-out performance of ~ 84%, which is nearly as good as the lasso with handcrafted features. ] .pull-right[ <img src="figures/transformer_probs.png"/> ] --- ### Embeddings In text data, we can understand context-dependent meaning by looking for clusters in the PCA of embeddings [4]. These represent a type of interaction. .center[ <img src="figures/bert_context.png" width=670/> ] --- ### Embeddings We can build the analogous visualization for our microbiome problem. Samples that are nearby in the embedding space are similar w.r.t. predictive features. .center[ <img src="figures/pca_comparison.svg" width=1400/> ] --- ### Interpolations Another common technique is to analyze linear interpolations in this space [5]. This figure traces out the microbiome profiles between two samples. .center[ <img src="figures/species_21_interpolation.svg" width=940/> ] --- ### Perturbation To explain a generic model’s decision on an instance, we can perturb it and see how the prediction changes. <img src="figures/perturbation_types.png"/> --- ### Integrated Gradients For example, we can compute the gradient of each class as we perturb a reference towards a sample of interest. `\begin{align*} \left(x_{i} - x_{i}'\right) \int_{\alpha \in \left[0, 1\right]} \frac{\partial f\left(x_{i}' + \alpha\left(x_{i} - x_{i}'\right)\right)}{\partial x_{i}} d\alpha \end{align*}` .center[ <img src="figures/integrated_gradients_animation.gif" width=1200/> ] --- ### Integrated Gradients In our microbiome example, this can highlight the species and timepoints that are most responsible for the disease vs. healthy classification of each example. .center[ <img src="figures/microbiome_integrated_gradients.svg"/> ] --- ### Sanity Checks Evaluating local explanations is notoriously subjective. Some researchers have proposed automatic "sanity checks" [6]. .center[ <img src="figures/sanity_checks.png" width=680/> ] There have also been theoretical results that identify situations where feature attribution is unidentified [7]. --- ### Concept Bottlenecks Alternatively, we can explain a decision by reducing the arbitrary feature space to a set of human-interpretable concepts [8]. This is part of a larger body of work that attempts to establish shared language/representations for interacting with models. .center[ <img src="figures/koh_concept.png" width=750 style="position: absolute; top: 340px; left: 300px"/> ] --- ### Concept Bottlenecks In the microbiome example, we could define interpretable "concepts" by looking at the taxa trends for commonly co-varying groups of species. .center[ <img src="figures/concept_1.svg"/> ] --- ### Concept Bottlenecks We reconfigure our transformer model to first predict the concept label before making a final classification. .center[ <img src="figures/concept_architecture.png"/> ] --- ### Concept Bottlenecks .pull-left[ Performance is in fact slightly better than before (85%), and we also obtain concept labels to help us explain each instance's prediction. ] .pull-right[ <img src="figures/concept_probs.png"/> ] --- ## Challenges --- ### Gauging Progress 1. Interpretability depends on criteria that are difficult to encode in the standard ML quantitative benchmarks. 1. This is one area where statisticians can excel: - We critically interrogate the situations where methods can be applied. - We study methods within their problem contexts. .center[ <img src="figures/future_da.png" width=700/> ] --- ### Human Studies .pull-left[ 1. These studies can quantify how explanations influence human judgment. 1. Common tasks include editing inputs to influence prediction and guessing model results from explanations. 1. Good explanations don't necessary improve Human-AI collaboration. ] .pull-right[ <img src="figures/polyjuice.png"/> An example task from an interpretability study [9]. ] --- ### Human Studies .pull-left[ 1. These studies can quantify how explanations influence human judgment. 1. Common tasks include editing inputs to influence prediction and guessing model results from explanations. 1. Good explanations don't necessary improve Human-AI collaboration. ] .pull-right[ <img src="figures/complementarity.png"/> An example of unpredictable effects during deployment [10]. ] --- ### Generalist Models .pull-left[ 1. **Modern machine learning models are being designed to solve many problems simultaneously.** 1. Multimodal datasets are becoming the norm, and we need methods to learn from them in an interpretable way. 1. We are also seeing increasingly rich ways to interact with them. ] .pull-right[ <img src="figures/generalist_models.png"/> ] --- ### Generalist Models .pull-left[ 1. Modern machine learning models are being designed to solve many problems simultaneously. 1. **Multimodal datasets are becoming the norm, and we need methods to learn from them in an interpretable way.** 1. We are also seeing increasingly rich ways to interact with them. ] .pull-right[ <img src="figures/open_vocabulary.gif"/> ] --- ### Generalist Models .pull-left[ 1. Modern machine learning models are being designed to solve many problems simultaneously. 1. Multimodal datasets are becoming the norm, and we need methods to learn from them in an interpretable way. 1. **We are also seeing increasingly rich ways to interact with them.** ] .pull-right[ <img src="figures/image_editing.gif"/> ] --- ### Session Preview Today's sessions will give us more nuanced language for making progress in ML interpretability, 1. [Cynthia Rudin, Hongtu Zhu, Yuan Ji] Making it easier to specify accurate, directly interpretable models in challenging scientific, medical, and social problems. 1. [Hubert Baniecki, Debashis Mondal] Enrich our language for auditing machine learning models and workflows, from input data to downstream decisions. <img src="figures/logo.png"/> --- ### References [1] R. Caruana, Y. Lou, J. Gehrke, et al. "Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission". In: _Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining_ (2015). [2] T. Gu, B. Dolan-Gavitt, and S. Garg. "BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain". In: _ArXiv_ abs/1708.06733 (2017). [3] C. Gosmann, M. N. Anahtar, S. A. Handley, et al. "Lactobacillus<U+2010>Deficient Cervicovaginal Bacterial Communities Are Associated with Increased HIV Acquisition in Young South African Women". In: _Immunity_ 46 (2017), p. 29<U+2013>37. [4] A. Coenen, E. Reif, A. Yuan, et al. "Visualizing and Measuring the Geometry of BERT". In: _ArXiv_ abs/1906.02715 (2019). --- ### References [5] Y. Liu, E. Jun, Q. Li, et al. "Latent Space Cartography: Visual Analysis of Vector Space Embeddings". In: _Computer Graphics Forum_ 38 (2019). [6] J. Adebayo, J. Gilmer, M. Muelly, et al. "Sanity Checks for Saliency Maps". In: _Neural Information Processing Systems_. 2018. [7] B. Bilodeau, N. Jaques, P. W. Koh, et al. "Impossibility Theorems for Feature Attribution". In: _Proceedings of the National Academy of Sciences of the United States of America_ 121 2 (2022), p. e2304406120 . --- ### References [8] P. W. Koh, T. Nguyen, Y. S. Tang, et al. "Concept Bottleneck Models". In: _ArXiv_ abs/2007.04612 (2020). [9] T. S. Wu, M. T. Ribeiro, J. Heer, et al. "Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models". In: _Annual Meeting of the Association for Computational Linguistics_. 2021. [10] G. Bansal, T. S. Wu, J. Zhou, et al. "Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance". In: _Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems_ (2020). --- ### References ``` ## Warning in `[[.BibEntry`(x, ind): subscript out of bounds ``` --- ### Attributions explainable reinforcement learning by iconpro86 from <a href="https://thenounproject.com/browse/icons/term/explainable-reinforcement-learning/" target="_blank" title="explainable reinforcement learning Icons">Noun Project</a> (CC BY 3.0) data visualization by Iconiqu from <a href="https://thenounproject.com/browse/icons/term/data-visualization/" target="_blank" title="data visualization Icons">Noun Project</a> (CC BY 3.0) Ruler by Dhipwise Store from <a href="https://thenounproject.com/browse/icons/term/ruler/" target="_blank" title="Ruler Icons">Noun Project</a> (CC BY 3.0) bacillus by Cécile Lanza Parker from <a href="https://thenounproject.com/browse/icons/term/bacillus/" target="_blank" title="bacillus Icons">Noun Project</a> (CC BY 3.0) Microbe by Prettycons from <a href="https://thenounproject.com/browse/icons/term/microbe/" target="_blank" title="Microbe Icons">Noun Project</a> (CC BY 3.0) bacterium by HideMaru from <a href="https://thenounproject.com/browse/icons/term/bacterium/" target="_blank" title="bacterium Icons">Noun Project</a> (CC BY 3.0) bacterium by Maria Zamchy from <a href="https://thenounproject.com/browse/icons/term/bacterium/" target="_blank" title="bacterium Icons">Noun Project</a> (CC BY 3.0) --- ### Simulation Mechanism --- ### Historical Context 1. **Initial Wave**: Early ML systems required expert-crafted features. Deep learning removed this requirement, creating a new need for post-hoc expalnations. 1. **Critical Self-Reflection**: Experiments highlight issues in common assumptions and commentaries attempt to establish shared vocabulary [11; 6; 12; 13]. 1. **Systematic Evaluation**: Systematic progress depends on shared tasks, objective evaluation, and substantive theory -- these are beginning to emerge. --- ### Roadmap These techniques are representative of larger classes of techniques for model interpretability and explainability. 1. Direct interpretability `\(\to\)` Sparse Regression, Featurization 1. Latent representations `\(\to\)` Visualizing Embeddings 1. Local explainability `\(\to\)` Integrated Gradients 1. Shared representations `\(\to\)` Concept Bottleneck --- ### Visualization Metaphor .pull-left[ 1. People from many backgrounds are comfortable reading and creating data visualizations. 1. Visualization software provide shared representations between computer hardware and human thought. 1. How will interpretable ML appear in future scientific reports, newspaper articles, and undergraduate classrooms? ] .pull-right[ <img src="figures/nyt-covid.png"/> ] ---