<div id="links"> Slides: https://go.wisc.edu/05h9p7<br> Lab Site: https://go.wisc.edu/pgb8nl </div> ### Microbiome Data Science The goal of my lab is to help microbiome researchers get the most out of their data. The essential statistical questions are: * **Integration**: How should we analyze data gathered from multiple batches or technologies? * **Experimental Design**: How should we design microbiome experiments to accelerate engineering or medical applications? * **Reproducibility**: How can we be sure our conclusions are trustworthy? We work directly with microbiologists on problems related to HIV, the gut-brain axis, and synthetic communities. --- exclude: true ### Problem Solving To solve these problems, we draw from classic ideas from statistics and computing: * **Simulation**: It's easier to design experiments and benchmark methods when we can quickly generate realistic data. * **Visualization**: A good interface can shape the way we think for the better, helping us be more critical and creative. --- ### Themes: Visualization Topic models were independently developed for analyzing genotype and text data [1; 2] and are now widely used in computational genomics [3; 4; 5; 6]. .center[ <img src="figures/text_genotypes.png" width=1200/> ] --- exclude: true ### Themes: Visualization 1. They are helpful because real biological samples often don't separate cleanly into clusters. 1. Instead of matching a sample to a single cluster centroid, view them as mixtures of multiple representatives. <span style="font-size: 18px;"> .center[ <img src="figures/clusters_vs_mixtures.png" width=565/><br/> Figure adapted from [7]. ] </span> --- ### Themes: Visualization In [8], we studied navigation across ensembles of topic models. .center[ <img src="figures/alto_sketches_annotated alignment.png" width="850" style="display: block; margin: auto;" /> <span style="font-size: 20px;"> In the Sankey diagram, columns are models and rectangles are topics. </span> ] --- exclude: true ### Themes: Visualization The patterns of flow on this diagram are useful for deciding on the number of topics present in a dataset. .center[ <img src="figures/lda-combined.png" width="2003" style="display: block; margin: auto;" /> <span style="font-size: 20px;"> The diagnostics in this simulation suggest that the true `\(K\)` is 5. </span> ] --- ### Themes: Simulation We have written an online guide [9] for using simulators in microbiome analysis. We used it to teach a short course for computational biologists. <img src="figures/microbiome_intro_chapter.png"/> --- ### Themes: Simulation .pull-three-quarters-left[ <img src="figures/multivariate_power_curve.png" width=680/> ] .pull-three-quarters-right[ <span style="font-size: 18px;"> Here is an example of using a simulator to guide power analysis in a multivariate model. Panels A + B check simulator faithfulness, and C compares models across sample sizes. </span> ] --- ### Themes: Interface Design .pull-three-quarters-left[ <img src="figures/multimedia_webpage.png" width=740/> ] .pull-three-quarters-right[ <span style="font-size: 18px;"> We wrote a package for mediation analysis of microbiome data [10]. See also [the code](https://github.com/krisrs1128/multimedia) and this [blog post](https://krisrs1128.github.io/info-uncertainty//posts/mediation-software). </span> ] --- ### Themes: Interface Design The interface makes it easy to share functions (e.g., sensitivity analysis) across a range of model types. ``` r library(multimedia) model <- multimedia( exper, # experiment data lnm_model(), # mediation model glmnet_model(lambda = 0.5) # outcome model ) |> estimate(exper) ``` --- ### Themes: Interface Design This gives a principled approach to data integration. For example, these data suggest disease `\(\to\)` microbe `\(\to\)` metabolite indirect effects. .center[<img src="figures/lasso_mediators_plot-1.png" width=740/>] .center[<span style="font-size: 20px;"> Each panel is a microbe-metabolite pair, and colors separate disease/healthy states. </span> ] --- ### Reaching Out * You can learn more at [https://go.wisc.edu/pgb8nl](https://go.wisc.edu/pgb8nl). - Simulation: [11; 3; 9] - Interfaces: [10; 12] - Visualization: [8; 3; 13] * I enjoy working with students with different educational levels and backgrounds. * Email: [ksankaran@wisc.edu](mailto:ksankaran@wisc.edu) --- class: reference ### References [1] J. K. Pritchard, M. Stephens, and P. Donnelly. "Inference of Population Structure Using Multilocus Genotype Data". In: _Genetics_ 155.2 (Jun. 2000), p. 945–959. ISSN: 1943-2631. DOI: 10.1093/genetics/155.2.945. <http://dx.doi.org/10.1093/genetics/155.2.945>. [2] D. M. Blei, A. Y. Ng, and M. I. Jordan. "Latent dirichlet allocation". In: _J. Mach. Learn. Res._ 3.null (Mar. 2003), p. 993–1022. ISSN: 1532-4435. [3] K. Sankaran and S. P. Holmes. "Latent variable modeling for the microbiome". In: _Biostatistics_ 20.4 (Jun. 2018), p. 599–614. ISSN: 1468-4357. DOI: 10.1093/biostatistics/kxy018. <http://dx.doi.org/10.1093/biostatistics/kxy018>. [4] A. Kim, S. Sevanto, E. R. Moore, and N. Lubbers. "Latent Dirichlet Allocation modeling of environmental microbiomes". In: _PLOS Computational Biology_ 19.6 (Jun. 2023). Ed. by G. Zeller, p. e1011075. ISSN: 1553-7358. DOI: 10.1371/journal.pcbi.1011075. <http://dx.doi.org/10.1371/journal.pcbi.1011075>. [5] C. Tataru, M. Peras, E. Rutherford, K. Dunlap, X. Yin, B. S. Chrisman, T. Z. DeSantis, D. P. Wall, S. Iwai, and M. M. David. "Topic modeling for multi-omic integration in the human gut microbiome and implications for Autism". In: _Scientific Reports_ 13.1 (Jul. 2023). ISSN: 2045-2322. DOI: 10.1038/s41598-023-38228-0. <http://dx.doi.org/10.1038/s41598-023-38228-0>. [6] X. Peng, J. Lee, M. Adamow, C. Maher, M. A. Postow, M. K. Callahan, K. S. Panageas, and R. Shen. "A topic modeling approach reveals the dynamic T cell composition of peripheral blood during cancer immunotherapy". In: _Cell Reports Methods_ 3.8 (Aug. 2023), p. 100546. ISSN: 2667-2375. DOI: 10.1016/j.crmeth.2023.100546. <http://dx.doi.org/10.1016/j.crmeth.2023.100546>. [7] L. Symul, P. Jeganathan, E. K. Costello, M. France, S. M. Bloom, D. S. Kwon, J. Ravel, D. A. Relman, and S. Holmes. "Sub-communities of the vaginal microbiota in pregnant and non-pregnant women". In: _Proceedings of the Royal Society B: Biological Sciences_ 290.2011 (Nov. 2023). ISSN: 1471-2954. DOI: 10.1098/rspb.2023.1461. <http://dx.doi.org/10.1098/rspb.2023.1461>. [8] J. Fukuyama, K. Sankaran, and L. Symul. "Multiscale analysis of count data through topic alignment". In: _Biostatistics_ 24.4 (Jun. 2022), p. 1045–1065. ISSN: 1468-4357. DOI: 10.1093/biostatistics/kxac018. <http://dx.doi.org/10.1093/biostatistics/kxac018>. [9] K. Sankaran, S. Kodikara, J. J. Li, and K. Lê Cao. _Chapter 1 Introduction | Simulation for Microbiome Analysis - krisrs1128.github.io_. <https://krisrs1128.github.io/microbiome-simulation/index.html>. [Accessed 09-09-2024]. [10] H. Jiang, X. Miao, M. W. Thairu, M. Beebe, D. W. Grupe, R. J. Davidson, J. Handelsman, and K. Sankaran. "multimedia: Multimodal Mediation Analysis of Microbiome Data". In: _bioRxiv_ (Mar. 2024). DOI: 10.1101/2024.03.27.587024. <http://dx.doi.org/10.1101/2024.03.27.587024>. [11] K. Sankaran and S. P. Holmes. "Generative Models: An Interdisciplinary Perspective". In: _Annual Review of Statistics and Its Application_ 10.1 (Mar. 2023), p. 325–352. ISSN: 2326-831X. DOI: 10.1146/annurev-statistics-033121-110134. <http://dx.doi.org/10.1146/annurev-statistics-033121-110134>. --- class: reference ### References [12] K. Sankaran. "Data Science Principles for Interpretable and Explainable AI". In: _arXiv_ copyright = Creative Commons Attribution Non Commercial No Derivatives 4.0 International (2024). DOI: 10.48550/ARXIV.2405.10552. <https://arxiv.org/abs/2405.10552>. [13] K. Sankaran and S. P. Holmes. "Multitable Methods for Microbiome Data Integration". In: _Frontiers in Genetics_ 10 (Aug. 2019). ISSN: 1664-8021. DOI: 10.3389/fgene.2019.00627. <http://dx.doi.org/10.3389/fgene.2019.00627>.