.center[ # Reasoning about Interventions: <br> A Statistical Tour <img src="figures/hypothetical_curve.svg" width = 600/> ### Wisconsin Institute for Discovery Seminar .large[Kris Sankaran | [krisrs1128.github.io/LSLab](krisrs1128.github.io/LSLab) | 28 September 2021] ] --- ### Reflection 1. It is one thing to *understand* robustness, resilience, and adaptability in a complex system. 2. It is another thing to *engineer* these properties into one. 3. In general, both problems are complex, but for different reasons. This talk will share statistical ideas that can help. --- ### Why do we make mathematical models? 1. **Distillation**: A few parameters in a model can reduce a large dataset to its essence. It is like data compression for humans. 2. **Manipulation**: A good model helps us make predictions about the consequences of specific interventions. It can help us plan for the future. Both activities are central to the process of discovery. .center[ <img src="figures/motivation_combined.svg" width=850/> ] --- class: middle ## Vignette 1: Antibiotics and the Microbiome --- ### Context and Data 1. In [1], we reanalyzed data from [2], which took longitudinal species count from 3 subjects (52 - 56 timepoints) 1. We drew inspiration from our collaborator's ecological metaphors for microbiome perturbations (antibiotics `\(\approx\)` a flash flood or wildfire) .center[ <img src="figures/adaptive-landscape-scenario-1.svg" width=720/> ] --- ### Context and Data 1. In [1], we reanalyzed data from [2], which took longitudinal species count from 3 subjects (52 - 56 timepoints) 1. This study had been motivated by ecological metaphors for microbiome perturbations (antibiotics `\(\approx\)` a flash flood or wildfire) .center[ <img src="figures/adaptive-landscape-scenario-2.svg" width=700/> ] --- ### Initial Visualization It was helpful to explore the raw data, and we built a small visualization package, `treelapse`, to help query which species were and were not strongly influenced by the perturbations.
--- ### Topic Models * We were inspired by an analogy between text and microbiome data [3] * Samples are like documents, and taxa are like words. * We took models for summarizing evolving themes across text corpora and applied them to the microbiome .center[ <img src="figures/neuroscience.png" width=800/> ] --- ### Topic Models * We were inspired by an analogy between text and microbiome data [3] * Samples are like documents, and taxa are like words. * We took models for summarizing evolving themes across text corpora and applied them to the microbiome .center[ <img src="figures/antibiotics_topics.png" width=650/> ] --- ### Topic Models This helped reduce the full community data into a few “themes,” * Vulnerable taxa: Those knocked out the full duration of the perturbation * Resilient taxa: Those that recover even while perturbations are ongoing * Robust taxa: Seem to thrive when other community members disappear .center[ <img src="figures/perturbation_topics.png" width=590 /> ] --- ### Aside: Resilience and Diversity How can we quantify the effect of diversity on resilience? 1. More diverse gut microbiota are less perturbed by interventions 1. Conversely, vaginal microbiota with Lactobacillus dominance are less prone to instability or dysbiosis .pull-left[ <img src="figures/stable_diverse.svg"/> ] .pull-right[ <img src="figures/stable_lactobacillus.svg"/> ] --- class: middle ## Vignette 2: Meditation and the Microbiome --- ### Hypothetical Reasoning 1. Data alone are not enough — it is valuable to reason hypothetically 1. We use experiments in order to predict consequences of potential actions 1. For this, we should switch from compression to manipulation .center[ <img src="figures/hypothetical_curve.svg" width=650/> ] --- ### Context and Data For concreteness, consider an ongoing study of the BeWell study at the Center for Healthy Minds. .pull-left[ 1. Participants are being randomly assigned between a mindfulness intervention (the Healthy Minds App) and several types of control. 2. Complementary data will be generated pre-intervention and four months later ] .pull-right[ <img src="figures/design.svg" width=500/> ] --- ### Psychometric & Genomic Integration 1. We are gathering psychometric, behavioral, chemical, and microbiome compositional data 2. Changes in one might be associated with effects across all — how can we reason about the possibilities? <img src="figures/integration.svg"/> --- ### Graphical Modeling 1. Generative graphical models provide a language for clarifying the dependence relationships across measurements 2. Linked variables are dependent after controlling the influence from others 3. See [4] for examples across a variety of biological contexts .center[ <img src="figures/dags.png" width = 550/> ] --- ### A Grammar for Modeling .pull-left[ Different graphical structures can be composed with one another to express complex hypothesized structures. Heterogeneous Effect Model: Is the treatment effect modulated by microbiome profile? ] .pull-right[ <img src="figures/precision-dag.svg"/> ] --- ### A Grammar for Modeling .pull-left[ Different graphical structures can be composed with one another to express complex hypothesized structures. Mediation Model: Does meditation change diet, which changes the microbiome? ] .pull-right[ <img src="figures/mediation-dag-revised.svg"/> ] --- ### Interventions and Counterfactuals 1. Graphical models allow us to “intervene” on specific variables and evaluate their downstream effects 1. For example, we can compute counterfactual direct vs. indirect effects .pull-left[ Direct Effect <img src="figures/mediation-dag-intervene2.svg"/> ] .pull-right[ Indirect Effect <img src="figures/mediation-dag-intervene1.svg"/> ] --- ### Semisynthetic Simulation 1. Can we build realistic simulations for gauging estimating quality? 1. We could avoid strong model assumptions by using a bootstrap, but this is hard to apply in more complex designs or for power analysis 1. An alternative is to simulate from DAG-guided quantile regressions .center[ <img src="figures/mediation-assumptions2.svg" width=550/> ] --- ### Semisynthetic Simulation 1. Can we build realistic simulations for gauging estimating quality? 1. We could avoid strong model assumptions by using a bootstrap, but this is hard to apply in more complex designs or for power analysis 1. An alternative is to simulate from DAG-guided quantile regressions .center[ <img src="figures/mediation-assumptions.svg" width=550/> ] --- ### Semisynthetic Power Analysis By modifying the simulation parameters, we can even carry out a power analysis using these semisynthetic data. .center[ <img src="figures/mediation-assumptions-power.svg" width=650/> ] --- ### Takeaways 1. The most powerful models are those that promote scientific thinking, 1. Compression: Remove noise to reveal essential structure. 1. Manipulation: Allow us to reason flexibly about alternatives. 1. Reasoning about the generative mechanism is key for integrative analysis 1. Graphical models can support modeling and evaluation --- ## References [1] K. Sankaran and S. P. Holmes. "Latent variable modeling for the microbiome". In: _Biostatistics_ 20.4 (2019), pp. 599-614. [2] L. Dethlefsen and D. A. Relman. "Incomplete recovery and individualized responses of the human distal gut microbiota to repeated antibiotic perturbation". In: _Proceedings of the National Academy of Sciences_ 108.supplement\_1 (2011), pp. 4554-4561. [3] P. D. Schloss and J. Handelsman. "The last word: books as a statistical metaphor for microbial communities". In: _Annu. Rev. Microbiol._ 61 (2007), pp. 23-34. [4] K. Sankaran and S. P. Holmes. "Generative Models: An Interdisciplinary Perspective". In: _arXiv preprint arXiv:2208.06011_ (2022).