class: title <div id="title"> Interactivity for Interpretable Machine Learning<br/> </div> <div id="under_title"> Computational Genomics Summer Institute 2024<br/> </div> <div id="subtitle"> Kris Sankaran <br/> UW - Madison <br/> 11 | July | 2024 <br/> Lab: <a href="https://go.wisc.edu/pgb8nl">go.wisc.edu/pgb8nl</a> <br/> </div> <div id="subtitle_right"> <a href="https://go.wisc.edu/pl9a65">Slides</a><br/> <!-- <img src="figures/cropped-CGSI_logo_final-2.jpg" width=100/><br/> --> <img src="figures/slides-link.png" width=100/> </div> --- ### Learning Objectives By the end of this tutorial, you will be able to: 1. Compare and contrast outputs from different interpretability techniques. 1. Apply interactive computing ideas to the research code that you develop. --- ### Motivation: Outpatient Care for Pneumonia .center[ <img src="figures/asthma.png" width=1000/> <span style="font-size: 18px;"> Example from [1]. </span> ] <!-- Q&A This is pretty counterintuitive. What do you think is going on? --> --- ### Motivation 1. Computers let us solve problems that would be impossible to manage any other way, but we need some way of **checking our work**, especially when the stakes are high. 1. We can often **improve our models** by looking more closely at what they learn and intervening as necessary. 1. In the long-run, we'll be able to **get more out of our data and models** if we look more critically at them. --- class: middle .center[ ## Interpretability ] --- ### What is Interpretability? Models with these properties tend to be more interpretable [2; 3; 4; 5]: <img src="figures/simplicity.png" width=50/> **Parsimony**: The model has relatively few components. <br/><br/> <img src="figures/crystal-ball.png" width=40/> **Simulatability**: Users can predict model behavior on new samples. <br/><br/> <img src="figures/Lego_Brick_4.svg" width=50/> **Modularity**: The model can be broken into simpler components. <!-- First Q&A: Would you say that a linear regression model is interpretable? --> --- ### Distinctions 1. **Interpretable Model**: A model that, by virtue of its design, is easy to accurately describe and edit. 1. **Explainability Technique**: A method that summarizes some aspect of a black box system. .center[ <img src="figures/black_box_flashlight.png" width=720/> ] --- ### Illustrative Example Imagine sampling microbiome profiles over time and seeing that some of the hosts develop a disease. Can we discover risk factors from these data? .center[ <img src="figures/simulated-data.svg" width=830/> ] <span style="font-size: 18px;"> This simulation is motivated by microbiome studies of HIV risk over time [6]. </span> --- ### Transformers .pull-left[ 1. We apply the GPT2 architecture to our problem, viewing a sequence of microbiome profiles like a sequence of words. 1. The hope is that this model automatically discovers temporal and taxonomic features that distinguish the classes. ] .pull-right[ <img src="figures/transformers-analogy-2.png"/> ] --- ### Transformers .pull-left[ 1. We apply the GPT2 architecture to our problem, viewing a sequence of microbiome profiles like a sequence of words. 1. The hope is that this model automatically discovers temporal and taxonomic features that distinguish the classes. Transformer: 83.2%<br/> Lasso: 78%<br/> Lasso (True Features): 87.8% ] .pull-right[ <img src="figures/transformer_analogy.png"/> ] --- ### Representation Learning How can we try to understand the transformer embeddings `\(\tilde{\mathbf{z}}_{i}\)`? <br/> <br/> .center[ <img src="figures/transformer_layers.png" width=900/> ] --- ### Representation Learning We can use PCA to explore the embedding space [7; 8]. <br/> <br/> .center[ <img src="figures/transformer_layers_pca.png" width=650/> ] --- ### Embeddings We can interpolate between samples to understand the resulting embedding space. .center[ <img src="figures/species_21_interpolation.svg" width=940/> ] --- ### Perturbation To understand predictions at individual samples, we can use *local* explanations. These apply a perturbation and quantify the results. <br/> <br/> .center[ <img src="figures/perturbation.png" width=500/> ] Examples: LIME [9], integrated gradients [10], GradCAM [11], local variable importance [12]. <!-- Q&A: How can we operationalize this intuition? --> --- ### Integrated Gradients Integrated gradients are better than ordinary gradients because they are less sensitive to saturation in the usual logistic loss. `\begin{align*} \left(x_{i} - x_{i}'\right) \int_{\alpha \in \left[0, 1\right]} \frac{\partial f\left(x_{i}' + \alpha\left(x_{i} - x_{i}'\right)\right)}{\partial x_{i}} d\alpha \end{align*}` .center[ <img src="figures/integrated_gradients_animation.gif" width=750/><br/> <span style="font-size: 18px;"> Interactive visualization from [13]. </span> ] --- ### Integrated Gradients In our microbiome example, this highlights the species and timepoints that are most responsible for the disease vs. healthy classification for each sample. .center[ <img src="figures/microbiome_integrated_gradients.svg"/> ] --- ### Concept Bottlenecks Alternatively, we can explain a decision by reducing the arbitrary feature space to a set of human-interpretable concepts [14]. This is part of a larger body of work that attempts to establish shared language/representations for interacting with models [15]. .center[ <span style="font-size: 18px;"> <img src="figures/koh_concept.png" width=750/> <br/> Figure from [14]. </span> ] --- ### Concept Bottlenecks Alternatively, we can explain a decision by reducing the arbitrary feature space to a set of human-interpretable concepts [14]. This is part of a larger body of work that attempts to establish shared language/representations for interacting with models [15]. .center[ <span style="font-size: 18px;"> <img src="figures/concept_architecture-2.png" width=600/> </span> ] --- ### Concept Bottlenecks In the microbiome example, we could define concepts like blooms or trends. These would have to be manually annotated in the original training data. The test set performance is comparable to before (84%). .center[ <img src="figures/trends.png" width=1000/> ] --- .center[ ## Interactivity ] --- ### What is Interactivity? Interactivity allows us to specify what computation we want done without having to write code. .pull-left[ <img src="figures/autocomplete.gif" width=300/><br/> <span style="font-size: 18px;"> Text entry is a type of interaction. </span> ] .pull-right[ <img src="figures/delegate-calc-1.gif"/> <span style="font-size: 18px;"> An interactive delegate calculator created by the NYT [16]. </span> ] --- ### <span style="col: #D93611;">Focus-plus-Context</span> We can let readers zoom into patterns of interest without losing relevant context. This has a flavor of "analyzing the residuals." <br/> <br/> .pull-left[ > Overview first, zoom and filter, then details on demand. -- Schneiderman's "Visual Information Seeking Mantra" [17]. ] .pull-right[ .center[<img src="figures/structuration.png" width=250/>] <span style="font-size: 18px;"> Tukey's iterative structuration, as interpreted by [18]. </span> ] --- ### Example: Tree Navigation Large trees can be difficult to explore. Focus-plus-context gives a natural way of navigating them, updating the view according to user interactions [19; 20]. .center[ <iframe src="https://krisrs1128.github.io/treelapse/pages/antibiotic.html#htmlwidget-dd8d9e7ec77f2a8cc333" width=900 height=380></iframe> ] --- ### Example: Dimensionality Reduction We can use focus-plus-context to compare topics across models with different complexity [21; 22]. Low `\(K\)` gives context, large `\(K\)` gives focus. .center[ <img src="figures/vaginal_microbiome_alto.jpg" width=900/> ] --- ### Linked Views .pull-left[ 1. We can navigate higher dimensions by linking low-dimensional views [23; 24]. 1. Notice that we're defining queries graphically, not just through selection menus. ] .pull-right[ <iframe src="https://connect.doit.wisc.edu/content/6df2063a-1c7d-4f01-b98c-3aebed82d190/" allowfullscreen="" data-external="1" height=500 width=600></iframe> ] <!-- Q&A: What are some interesting questions you'd like to answer on this flight delay data? --> --- ### Example: Model Evaluation We can use these ideas to evaluate the quality of a single-cell simulator. <iframe src="http://localhost:5000/" width=950 height=450/> --- ### Software .pull-left[ **Interpretability** 1. Captum (Python) 1. DALEX (R) 1. imodels (Python) 1. interpretml (Python) Review Paper<br/> - [Code Repository](https://go.wisc.edu/3k1ewe) ] .pull-right[ **Interactivity** 1. Shiny (R/Python) 1. Vega/Vega-lite (JS, Python, R) 1. D3 (JS) 1. p5 (JS) 1. Jupyter Widgets (Python) Visualization Course<br/> - Notes [I](https://krisrs1128.github.io/stat679_notes/), [II](https://krisrs1128.github.io/stat436_s24/)<br/>- Recordings [I](https://www.youtube.com/watch?v=cc__b5R6OzA&list=PLhax_7Mawcfk1GEl_vOg7cE_vtRTsqMWw&pp=gAQBiAQB), [II](https://mediaspace.wisc.edu/channel/STAT+479%3A+Statistical+Data+Visualization/197911113) ] --- class: middle .center[ ## Augmentation and AI ] --- ### AI and IA - Design Space Computers are good at accurately reaching well-defined goals, but people are good at criticism and planning. How can we get the best of both worlds? **Artificial Intelligence (AI)**: Solve problems without human intervention.<br/> **Intelligence Augmentation (IA)**: Enhance human problem solving ability. <br/> <br/> .center[ <img src="figures/dm_vs_ai.png" width=800/> ] --- ### Guide-Decide Loop 1. In an interactive setting, AI can make suggestions for the next step in an analysis [25; 26]. 1. This depends on their being a shared representation that links the frontend (for human interpretation) with the backend (for model prediction). .center[ <img src="figures/guide-decide.png" width=700/> ] --- ### Interactive Translation A well-designed interface helps professional translators achieve better results than simply editing the output of an automated translation system [27; 28; 29]. .center[ <img src="figures/phrasal.png" width=800/> ] Predictions can be updated in response to interactions. The shared representation here is natural language. --- ### Conclusion 1. Interpretability and interactivity can simplify the process of specifying and validating models. 1. Good research questions arise from carefully looking at your data and models, and these tools can help. --- ### Thank you! * Lab Members: Margaret Thairu, Hanying Jiang, Shuchen Yan, Yuliang Peng, Kaiyan Ma, Kai Cui, and Kobe Uko. * Funding: NIGMS R01GM152744. --- class: reference <h3 style="font-size: 22px;">References</h3> [1] R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad. "Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission". In: _Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining_ (2015). [2] Z. C. Lipton. "The Mythos of Model Interpretability". En. In: _ACM Queue_ 16.3 (Jun. 2018), pp. 31-57. [3] W. J. Murdoch, C. Singh, K. Kumbier, R. Abbasi-Asl, and B. Yu. "Definitions, methods, and applications in interpretable machine learning". En. In: _Proc. Natl. Acad. Sci. U. S. A._ 116.44 (Oct. 2019), pp. 22071-22080. [4] F. Doshi-Velez and B. Kim. "Towards A rigorous science of interpretable machine learning". In: _arXiv_ (2017). [5] K. Sankaran. "Data Science Principles for Interpretable and Explainable AI". In: _arXiv_ (2024). [6] C. Gosmann, M. N. Anahtar, S. A. Handley, M. Farcasanu, G. Abu-Ali, B. A. Bowman, N. Padavattan, C. Desai, et al. "Lactobacillus-deficient cervicovaginal bacterial communities are associated with increased HIV acquisition in young south African women". En. In: _Immunity_ 46.1 (Jan. 2017), pp. 29-37. [7] A. Coenen, E. Reif, A. Yuan, B. Kim, A. Pearce, F. Viégas, and M. Wattenberg. "Visualizing and measuring the geometry of BERT". In: _Proceedings of the 33rd International Conference on Neural Information Processing Systems_. Red Hook, NY, USA: Curran Associates Inc., 2019. [8] D. Erhan, Y. Bengio, A. Courville, P. Manzagol, P. Vincent, and S. Bengio. "Why Does Unsupervised Pre-training Help Deep Learning?" In: _Journal of Machine Learning Research_ 11.19 (2010), pp. 625-660. URL: [http://jmlr.org/papers/v11/erhan10a.html](http://jmlr.org/papers/v11/erhan10a.html). [9] M. T. Ribeiro, S. Singh, and C. Guestrin. "``Why Should I Trust You?'': Explaining the Predictions of Any Classifier". In: _Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining_. KDD '16 (2016). DOI: [10.1145/2939672.2939778](https://doi.org/10.1145%2F2939672.2939778). URL: [https://doi.org/10.1145/2939672.2939778](https://doi.org/10.1145/2939672.2939778). [10] M. Sundararajan, A. Taly, and Q. Yan. "Axiomatic attribution for deep networks". In: _Proceedings of the 34th International Conference on Machine Learning - Volume 70_. ICML'17. Sydney, NSW, Australia: JMLR.org, 2017, p. 3319–3328. [11] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. "Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization". In: _2017 IEEE International Conference on Computer Vision (ICCV)_. 2017, pp. 618-626. DOI: [10.1109/ICCV.2017.74](https://doi.org/10.1109%2FICCV.2017.74). [12] A. Agarwal, A. M. Kenney, Y. S. Tan, T. M. Tang, and B. Yu. "MDI+: A flexible random forest-based feature importance framework". In: _arXiv_ (2023). [13] P. Sturmfels, S. Lundberg, and S. Lee. "Visualizing the Impact of Feature Attribution Baselines". In: _Distill_ 5.1 (Jan. 2020). ISSN: 2476-0757. DOI: [10.23915/distill.00022](https://doi.org/10.23915%2Fdistill.00022). URL: [http://dx.doi.org/10.23915/distill.00022](http://dx.doi.org/10.23915/distill.00022). [14] P. W. Koh, T. Nguyen, Y. S. Tang, S. Mussmann, E. Pierson, B. Kim, and P. Liang. "Concept Bottleneck Models". In: _Proceedings of the 33rd International Conference on Neural Information Processing Systems_. Red Hook, NY, USA: Curran Associates Inc., 2020. --- class: reference [15] M. Yuksekgonul, M. Wang, and J. Zou. "Post-hoc Concept Bottleneck Models". In: _The Eleventh International Conference on Learning Representations _. 2023. URL: [https://openreview.net/forum?id=nA5AZ8CEyow](https://openreview.net/forum?id=nA5AZ8CEyow). [16] G. Aisch. _In Defense of Interactive Graphics - vis4.net_. <https://www.vis4.net/blog/in-defense-of-interactive-graphics/>. [Accessed 08-07-2024]. [17] B. Shneiderman. "The eyes have it: a task by data type taxonomy for information visualizations". In: _Proceedings 1996 IEEE Symposium on Visual Languages_. Boulder, CO, USA: IEEE Comput. Soc. Press, 2002. [18] S. Holmes. "Comment on 'A model for studying display methods of statistical graphics'". In: _J. Comput. Graph. Stat._ 2.4 (Dec. 1993), pp. 349-353. [19] J. Heer and S. K. Card. "DOITrees Revisited: Scalable, Space-Constrained Visualization of Hierarchical Data". In: _Advanced Visual Interfaces_ (2004), pp. 421-424. URL: [http://vis.stanford.edu/papers/doitrees-revisited](http://vis.stanford.edu/papers/doitrees-revisited). [20] K. Sankaran and S. Holmes. "Interactive Visualization of Hierarchically Structured Data". En. In: _Journal of Computational and Graphical Statistics_ 27.3 (Jul. 2018), pp. 553-563. ISSN: 1061-8600, 1537-2715. DOI: [10.1080/10618600.2017.1392866](https://doi.org/10.1080%2F10618600.2017.1392866). [21] L. Symul, P. Jeganathan, E. K. Costello, M. France, S. M. Bloom, D. S. Kwon, J. Ravel, D. A. Relman, et al. "Sub-communities of the vaginal microbiota in pregnant and non-pregnant women". En. In: _Proc. Biol. Sci._ 290.2011 (Nov. 2023), p. 20231461. [22] J. Fukuyama, K. Sankaran, and L. Symul. "Multiscale analysis of count data through topic alignment". En. In: _Biostatistics_ 24.4 (Oct. 2023), pp. 1045-1065. [23] R. A. Becker and W. S. Cleveland. "Brushing Scatterplots". In: _Technometrics_ 29.2 (May. 1987), p. 127. [24] A. Buja, D. Cook, and D. F. Swayne. "Interactive high-dimensional data visualization". En. In: _J. Comput. Graph. Stat._ 5.1 (Mar. 1996), pp. 78-99. [25] J. Heer, J. Hellerstein, and S. Kandel. "Predictive Interaction for Data Transformation". In: _Conference on Innovative Data Systems Research (CIDR)_ (2015). URL: [https://idl.uw.edu/papers/predictive-interaction](https://idl.uw.edu/papers/predictive-interaction). [26] J. Heer. "Agency plus automation: Designing artificial intelligence into interactive systems". En. In: _Proc. Natl. Acad. Sci. U. S. A._ 116.6 (Feb. 2019), pp. 1844-1850. [27] S. Green, J. Chuang, J. Heer, and C. D. Manning. "Predictive translation memory: a mixed-initiative system for human language translation". In: _Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology_. UIST '14. Honolulu, Hawaii, USA: Association for Computing Machinery, 2014, p. 177–187. ISBN: 9781450330695. DOI: [10.1145/2642918.2647408](https://doi.org/10.1145%2F2642918.2647408). [28] S. Green, S. I. Wang, J. Chuang, J. Heer, S. Schuster, and C. D. Manning. "Human Effort and Machine Learnability in Computer Aided Translation". In: _Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)_. Ed. by A. Moschitti, B. Pang and W. Daelemans. Doha, Qatar, Oct. 2014, pp. 1225-1236. DOI: [10.3115/v1/D14-1130](https://doi.org/10.3115%2Fv1%2FD14-1130). [29] S. Green, J. Heer, and C. D. Manning. "The efficacy of human post-editing for language translation". In: _Proceedings of the SIGCHI Conference on Human Factors in Computing Systems_. CHI '13. Paris, France: Association for Computing Machinery, 2013, p. 439–448. ISBN: 9781450318990. DOI: [10.1145/2470654.2470718](https://doi.org/10.1145%2F2470654.2470718). URL: [https://doi.org/10.1145/2470654.2470718](https://doi.org/10.1145/2470654.2470718). --- class: reference [30] D. Kundaliya. "Computing - Incisive Media: Google AI chatbot Bard gives wrong answer in its first demo". In: _Computing_ (Sep. 2023). URL: [https://ezproxy.library.wisc.edu/login?url=https://www.proquest.com/trade-journals/computing-incisive-media-google-ai-chatbot-bard/docview/2774438186/se-2](https://ezproxy.library.wisc.edu/login?url=https://www.proquest.com/trade-journals/computing-incisive-media-google-ai-chatbot-bard/docview/2774438186/se-2). [31] M. G. Chevrette, C. S. Thomas, A. Hurley, N. Rosario-Meléndez, K. Sankaran, Y. Tu, A. Hall, S. Magesh, et al. "Microbiome composition modulates secondary metabolism in a multispecies bacterial community". En. In: _Proc. Natl. Acad. Sci. U. S. A._ 119.42 (Oct. 2022), p. e2212930119. --- ### Motivation: The Google Bard Demo .center[ <img src="figures/bard-hallucination.webp" width=830/> See the discussion in [30]. ] --- ### Example: Multiple Testing Linked views can help make sense of a collection of hypothesis tests. Each letter corresponds to an experimental factor (details in [31]). .center[ <iframe src="https://connect.doit.wisc.edu/content/7d109162-8690-4c84-8563-4bdee8f15ca0" width=1400 height=600></iframe> ] --- ### Figure Attribution Simplicity by M. Oki Orlando from <a href="https://thenounproject.com/browse/icons/term/simplicity/" target="_blank" title="Simplicity Icons">Noun Project</a> (CC BY 3.0) Crystal Ball by Kiki Rizky from <a href="https://thenounproject.com/browse/icons/term/crystal-ball/" target="_blank" title="Crystal Ball Icons">Noun Project</a> (CC BY 3.0)