<div id="links"> Slides: https://go.wisc.edu/99ndjd<br> Lab Site: https://measurement-and-microbes.org </div> ### Microbiome Data Science The goal of my lab is to help microbiome researchers get the most out of their data. The essential statistical questions are: * **Integration**: How should we analyze data gathered from multiple batches or technologies? * **Experimental Design**: How should we design microbiome experiments to accelerate engineering or medical applications? * **Reproducibility**: How can we be sure our conclusions are trustworthy? We work directly with microbiologists on problems related to HIV, the gut-brain axis, and synthetic communities. --- exclude: true ### Problem Solving To solve these problems, we draw from classic ideas from statistics and computing: * **Simulation**: It's easier to design experiments and benchmark methods when we can quickly generate realistic data. * **Visualization**: A good interface can shape the way we think for the better, helping us be more critical and creative. --- ### Themes: Visualization In [1], we studied navigation across ensembles of topic models. <img src="data:image/png;base64,#figures/alto_sketches_annotated_alignment.png" width=850/> <span style="font-size: 20px;"> In the Sankey diagram, columns are models and rectangles are topics. </span> --- ### Themes: Visualization The patterns of flow on this diagram are useful for deciding on the number of topics present in a dataset. .center[ <img src="data:image/png;base64,#figures/lda-combined.png" width=1000/> <span style="font-size: 20px;"> The diagnostics in this simulation suggest that the true `\(K\)` is 5. </span> ] --- ### Themes: Visualization In [9], we used the focus-plus-context principle to interactively explore how dimensionality reduction methods like UMAP distort the intrinsic geometry of the data. .center[ <img src="data:image/png;base64,#figures/pbmc_isometry.gif" width=550/> ] --- ### Themes: Visualization In [9], we used the focus-plus-context principle to interactively explore how dimensionality reduction methods like UMAP distort the intrinsic geometry of the data. .center[ <img src="data:image/png;base64,#figures/pbmc_boxplot.gif" width=550/> ] --- ### Themes: Simulation We wrote a review paper on designing simulators for microbiome data [2] and are actively developing new methods and software on this theme. <img src="data:image/png;base64,#figures/simulation_summary.png" width=900/> --- ### Themes: Simulation .pull-three-quarters-left[ <img src="data:image/png;base64,#figures/multivariate_power_curve.png" width=680/> ] .pull-three-quarters-right[ <span style="font-size: 18px;"> Here is an example of using a simulator to guide power analysis in a multivariate model. Panels A + B check simulator faithfulness, and C compares models across sample sizes. </span> ] --- ### Themes: Simulation Being able to faithfully emulate real data with controllable simulators can guide power analysis, benchmark competing methods, and support sanity checks. .center[ <img src="data:image/png;base64,#figures/zinb_marginals.png" width=800/> ] --- ### Collaborations with Biologists * Environmental Viral Dynamics [Anantharaman Lab, UWM]: Viruses are the most abundant biological entities on the planet. We are helping analyze temporal data to characterize environmental phage dynamics. * HIV Risk Factors [Kwon Lab, Ragon Institute]: The microbiome is closely related to immunity. We are using data to understand HIV risk from multi-omics data. * Mental Health and the Microbiome [Handelsman Lab, UWM]: Specific microbes have been linked to mood disorders. We are part of a multi-institute collaboration to better understand these gut-brain connections. --- ### Reaching Out * You can learn more at [https://measurement-and-microbes.org](https://go.wisc.edu/pgb8nl). - Simulation: [2; 3; 4; 5] - Interfaces: [6; 7] - Visualization: [9; 1; 4; 8] * I enjoy working with students with different educational levels and backgrounds. * Email: [ksankaran@wisc.edu](mailto:ksankaran@wisc.edu) --- class: reference ### References [1] J. Fukuyama, K. Sankaran, and L. Symul. "Multiscale analysis of count data through topic alignment". In: _Biostatistics_ 24.4 (Jun. 2022), p. 1045–1065. ISSN: 1468-4357. DOI: 10.1093/biostatistics/kxac018. <http://dx.doi.org/10.1093/biostatistics/kxac018>. [2] K. Sankaran, S. Kodikara, J. J. Li, and K. L. Cao. "Semisynthetic simulation for microbiome data analysis". In: _Briefings in Bioinformatics_ 26.1 (Nov. 2024). ISSN: 1477-4054. DOI: 10.1093/bib/bbaf051. <http://dx.doi.org/10.1093/bib/bbaf051>. [3] K. Sankaran and S. P. Holmes. "Generative Models: An Interdisciplinary Perspective". In: _Annual Review of Statistics and Its Application_ 10.1 (Mar. 2023), p. 325–352. ISSN: 2326-831X. DOI: 10.1146/annurev-statistics-033121-110134. <http://dx.doi.org/10.1146/annurev-statistics-033121-110134>. [4] K. Sankaran and S. P. Holmes. "Latent variable modeling for the microbiome". In: _Biostatistics_ 20.4 (Jun. 2018), p. 599–614. ISSN: 1468-4357. DOI: 10.1093/biostatistics/kxy018. <http://dx.doi.org/10.1093/biostatistics/kxy018>. [5] K. Sankaran, S. Kodikara, J. J. Li, and K. Lê Cao. _Chapter 1 Introduction | Simulation for Microbiome Analysis - krisrs1128.github.io_. <https://krisrs1128.github.io/microbiome-simulation/index.html>. [Accessed 09-09-2024]. [6] H. Jiang, X. Miao, M. W. Thairu, M. Beebe, D. W. Grupe, R. J. Davidson, J. Handelsman, and K. Sankaran. "Multimedia: multimodal mediation analysis of microbiome data". In: _Microbiology Spectrum_ 13.2 (Feb. 2025). Ed. by J. Claesen. ISSN: 2165-0497. DOI: 10.1128/spectrum.01131-24. <http://dx.doi.org/10.1128/spectrum.01131-24>. [7] K. Sankaran. "Data Science Principles for Interpretable and Explainable AI". In: _arXiv_ copyright = Creative Commons Attribution Non Commercial No Derivatives 4.0 International (2024). DOI: 10.48550/ARXIV.2405.10552. <https://arxiv.org/abs/2405.10552>. [8] K. Sankaran and S. P. Holmes. "Multitable Methods for Microbiome Data Integration". In: _Frontiers in Genetics_ 10 (Aug. 2019). ISSN: 1664-8021. DOI: 10.3389/fgene.2019.00627. <http://dx.doi.org/10.3389/fgene.2019.00627>. [9] K. Sankaran, S. Zhang, Chenab, and M. Meila. "Interactive Visualization of Metric Distortion in Nonlinear Data Embeddings using the distortions Package". In: _biorXiv_ (Aug. 2025). DOI: 10.1101/2025.08.21.671523. <http://dx.doi.org/10.1101/2025.08.21.671523>.