class: title # Modular Software for Mediation Analysis of Microbiome Data <style> .slide-background { background: url("figures/cover.png") no-repeat center center; background-size: cover; opacity: 0.5; } </style> <div id="subtitle_left"> Slides: <a href="https://go.wisc.edu/5ms623">go.wisc.edu/5ms623</a><br/> Paper: <a href="https://go.wisc.edu/ebm917">go.wisc.edu/ebm917</a><br/> Lab: <a href="https://measurement-and-microbes.org">measurement-and-microbes.org</a> <br/> </div> <div id="subtitle_right"> Kris Sankaran <br/> <a href="https://www.birs.ca/events/2025/5-day-workshops/25w5324/">Novel Approaches for Multi-Omics</a><br/> 17 | July | 2025 <br/> </div> <!-- 25 minute talk (including Q&A) --> `\(\def\Gsn{\mathcal{N}}\)` `\(\def\Dir{\text{Dir}}\)` `\(\def\Mult{\text{Mult}}\)` `\(\def\diag{\text{diag}}\)` `\(\def\*#1{\mathbf{#1}}\)` `\(\def\Scal{\mathcal{S}}\)` `\(\def\exp#1{\text{exp}\left(#1\right)}\)` `\(\def\logit#1{\text{logit}\left(#1\right)}\)` `\(\def\absarg#1{\left|#1\right|}\)` `\(\def\E{\mathbb{E}} % Expectation symbol\)` `\(\def\Earg#1{\E\left[{#1}\right]}\)` `\(\def\P{\mathbb{P}} % Expectation symbol\)` `\(\def\Parg#1{\P\left[{#1}\right]}\)` `\(\def\m#1{\boldsymbol{#1}}\)` `\(\def\Unif{\text{Unif}}\)` `\(\def\win{\tilde{w}_{\text{in}}}\)` `\(\def\reals{\mathbb{R}}\)` `\(\newcommand{\wout}{\tilde w_{\text{out}}}\)` --- ### Microbiome as Mediator We're just beginning to understand how the microbiome mediates the relationship between environmental exposures and human health. .center[ <div class="caption"> <img src="figures/chemotherapy.png"/> Figure adapted from [1]. </div> ] Chemotherapy-induced microbiome changes can worsen adverse events, like diarrhea and mucositis, limiting treatment dosage and duration [2; 3]. --- ### Alternative Mediators More generally, for microbiome research that seeks to explain mechanisms, it is often necessary to analyze indirect effects. .center[ <div class="caption"> <img src="figures/akkermansia_cholesterol.jpg" width=540/><br/> Figure adapted from [4]. </div> ] For example, _Akkermansia muciniphila_ can help regulate cholesterol levels by producing proteins that activate important pathways for cholesterol absorption and metabolism [5]. --- ### Causal Inference Setup Causal mediation analysis is often a good match for carrying out these studies [6; 7]. Traditional methods from social science and epidemiology need to be adapted to reflect properties of microbiome data [8; 9; 10]. .center[ <img src="figures/mediation-dag.png" width=500/> ] --- ### Multiple Mediators In most multi-omics applications, the mediators are high-dimensional. Each might represent a gene, methylation site, or microbial taxon, for example. .center[ <img src="figures/multivariate-mediation-unlabeled.png" width=600/> ] --- ### Two-Step Linear Regression During estimation, we might add a feature selection step/penalty to ensure that most mediators do not play a role for the outcome. `\begin{align*} M&=\alpha T+\zeta_M^{\top} X+E\\ Y&=\eta T+\beta^{\top} M+\zeta_Y^{\top} X+\epsilon \end{align*}` * `\(\eta \in \mathbb{R}\)`: Direct effect. * `\(\alpha^\top \beta\)`: Overall indirect effect. * `\(\alpha_{k}\beta_{k}\)`: The indirect effect through path `\(k\)`. What if we want to go beyond a linear model? --- ### Counterfactual Notation Under the counterfactual causal mediation analysis framework [11; 12], we imagine counterfactuals for both mediators and outcomes, depending on treatments that we may never have seen. _Mediator under treatment `\(t\)`_: `\begin{align*} M\left(t\right) \end{align*}` _Outcome under treatment combination_ `\(\left(t, t'\right)\)`: `\begin{align*} Y\left(t, M\left(t'\right)\right) \end{align*}` Note that this is not observable whenever `\(t \neq t'\)`, but we can reason about it abstractly. --- ### Direct vs. Indirect Effects Suppose that the treatments belong to two groups, `\(T \in \{0, 1\}\)`. Then, direct and indirect effects are defined as: `\begin{align*} \frac{1}{2} \sum_{t'=0}^{1} \mathbb{E}\left[Y(1, M(t')) - Y(0, M(t'))\right] \text{ (Direct Effect)}\\ \frac{1}{2} \sum_{t=0}^{1} \mathbb{E}\left[Y(t, M(1)) - Y(t, M(0))\right] \text{ (Indirect Effect)} \end{align*}` We may also want to know the pathwise indirect effect through mediator `\(k\)`: `\begin{align*} \frac{1}{2} \sum_{t'=0}^{1} \mathbb{E}\left[Y(t', M_k(1), M_{-k}(t')) - Y(t', M_k(0), M_{-k}(t'))\right] \end{align*}` --- ### Geometric Interpretation <img src="figures/geometric_1.png" width=800/> --- ### Geometric Interpretation <img src="figures/geometric_2.png" width=800/> --- ### Geometric Interpretation <img src="figures/geometric_3.png" width=800/> --- ### Identification We need to generalize the ignorability to support causal identification of these three effects. The complete required conditions are given in the backup slides. .center[ <img src="figures/identification.png"/> ] --- ### Integration Challenge Our interest in this project came from a re-analysis of a gut-brain axis microbiome study. Participants were randomized into either a mindfulness intervention or a control group. In this case, the the taxonomic community composition is an outcome, and the survey measurements are mediators. .center[ <img src="figures/mediation-dag-mindfulness.png" width=550/> ] --- ### Integration Challenge We can also use mediation analysis to guide microbiome-metabolome integration. This this quote comes from a study of inflammatory bowel disease (IBD) [13]: <div style="font-size: 20px"> > The CD-associated compounds eicosatrienoic (ETA) and docosapentaenoic (DPA) acid were involved in negative associations with control-associated species and positive associations with IBD-associated species. ETA and DPA are polyunsaturated long-chain fatty acids (PUFAs) [which] possess bactericidal activity by virtue of their hydrophobic nature and potential to disrupt bacterial cell membranes. </div> As in the previous example, we can imagine a mediation analysis with a high-dimensional microbiome outcome. --- ### Multiple Mediators and Outcomes These applications require multivariate outcomes. The effect definitions directly generalize to individual outcomes `\(Y_{j}\left(t\right)\)`. .center[ <img src="figures/multivariate-mediation-outcomes.png" width=600/> ] --- .center[ ## `multimedia` Package Design ] --- ### Univariate Mediation Analysis In the case of univariate mediators and outcomes, we have simple expressions for the two-step regression: `\begin{align*} m_{i} = \alpha t_i + \epsilon_{i}^{m_{i}}\\ y_i = \eta t_{i} + \beta m_i + \epsilon_i^y \end{align*}` In this case, the direct and indirect effects reduce to: - Direct effect: `\(\eta\)` - Indirect effect: `\(\alpha \beta\)` --- ### Code Interface We could imagine writing a function to implement this analysis. .large-code[ ``` r model <- mediation(y, m, t) effects(model) ``` ] But how can we adopt the more general counterfactual perspective, without restricting ourselves to linear models? --- ### Code Interface The `mediation` package [11; 12] solves this problem by instead creating an interface like: .large-code[ ``` r f1 <- lm(y ~ t + m) f2 <- lm(m ~ t) model <- mediation(f1, f2) effects(model) ``` ] We can now accommodate any mediation or outcome model, not just linear ones! Indeed, it's easy to use `glm` in the `mediation` package. --- ### Multimedia Interface `multimedia` generalizes this interface for multivariate responses and outcomes appropriate to microbiome data. For example, for the mindfulness study, we use a logistic-normal multinomial model [14; 15] for the outcomes and lasso regression [16] for the mediators. .large-code[ ``` r model <- multimedia( exper, # dataset lnm_model(), # outcome model glmnet_model(lambda = 0.5, alpha = 0) # mediation model ) ``` ] We fit each model separately, but this can still accommodate a few commonly used approaches. --- ### Available Models The LNM model is designed to have multivariate responses. The remaining options apply the same type of model in parallel across all outcomes. 1. `lnm_model`: A logistic normal multinomial model [17]. 1. `glmnet_model`: A call to `glmnet` for lasso or elastic net regression [18]. 1. `rf_model`: A call to `ranger` for random forest regression [19]. 1. `brms_model`: A call to `brms`, which can be customized likelihoods for `lognormal()`, `hurdle_negbinomial()`, `cox()`, etc. [20]. --- ### Data Formats Besides modeling, `multimedia` has data structures that make it easier to manipulate real and counterfactual data. For example, we can use `tidy` selection syntax [21] to categorize variables. ``` r head(ibd_data) ``` ``` ## # A tibble: 6 × 341 ## m0031_phenyllactate m0045_azelate m0171_palmitic_acid m0219_2_hydroxyhexadecanoate m0244_alpha_linolenic_a…¹ m0246_linoleic_acid m0250_oleic_acid m0256_stearic_acid ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 9.44 7.18 8.26 10.0 7.02 8.86 9.28 8.13 ## 2 8.92 6.47 8.19 9.57 6.29 10.4 11.1 9.10 ## 3 6.36 8.16 7.39 10.4 7.49 8.88 8.65 7.60 ## 4 8.24 5.46 9.44 9.83 5.78 8.53 8.68 8.83 ## 5 9.73 6.50 7.69 9.03 5.73 10.4 10.8 6.90 ## 6 0 4.44 7.47 5.65 5.50 9.03 9.73 7.44 ## # ℹ abbreviated name: ¹m0244_alpha_linolenic_acid ## # ℹ 333 more variables: m0354_arachidonic_acid <dbl>, m0393_1213_diHOME <dbl>, m0472_docosahexaenoic_acid <dbl>, m0731_lithocholic_acid <dbl>, ## # m0830_hyodeoxycholateursodeoxycholate <dbl>, m1129_chenodeoxycholate <dbl>, m1360_chenodeoxycholatedeoxycholate <dbl>, m0833_deoxycholic_acid <dbl>, ## # m0900_ketodeoxycholate <dbl>, m0930_cholate <dbl>, m1258_cholate <dbl>, m1303_lithocholate <dbl>, m1403_cholate <dbl>, m0253_sphingosine <dbl>, ## # m0906_threosphingosine <dbl>, m0998_threosphingosine <dbl>, m0927_piperine <dbl>, m0264_linoleoyl_ethanolamide <dbl>, m0339_linoleoyl_ethanolamide <dbl>, ## # m0410_C180e_MAG <dbl>, m0416_cholesterol <dbl>, m0467_C180_MAG <dbl>, m0478_cholestenone <dbl>, m0554_cholestenone <dbl>, m0878_C160_LPC <dbl>, ## # m0915_C200_LPE <dbl>, m0949_C180_LPC <dbl>, m1271_C342_DAG <dbl>, m1291_C342_DAG <dbl>, m1277_C341_DAG <dbl>, m1351_C364_DAG <dbl>, m1367_C364_DAG <dbl>, … ``` These are the data from the motivating IBD study as a raw `data.frame`. --- ### Data Formats Besides modeling, `multimedia` has data structures that make it easier to manipulate real and counterfactual data. For example, we can use `tidy` selection syntax [21] to categorize variables. ``` r library(multimedia) mediation_data( ibd_data, matches("^m[0-9]{4}"), # outcomes "Study.Group", # treatments starts_with("g") # mediators ) ``` ``` ## [Mediation Data] ## 220 samples with measurements for, ## 1 treatment: Study.Group ## 173 mediators: gSutterella, gMarvinbryantia, ... ## 155 outcomes: m0031_phenyllactate, m0045_azelate, ... ``` --- ### Bootstrap We can gauge uncertainty in the estimated effects by re-estimating both mediator and outcome models on bootstrap resampled versions of the original data. Here are the metabolites with the largest indirect and direct effects in the IBD example. .center[ <img src="figures/lasso_bootstrap_vis-1.svg" width=900/> ] --- ### Model Alteration The package provides syntax for altering models after they've been estimated. We can set specific coefficients to zero or re-estimate with new model specifications. .center[ <img src="figures/visualize_long-1.png"/> ] --- ``` r fit <- multimedia(exper, outcome = rf_model(num.trees = 1e3)) |> estimate(exper) altered_m <- nullify(fit, "T->M") |> estimate(exper) altered_ty <- nullify(fit, "T->Y") |> estimate(exper) ``` .center[ <img src="figures/visualize_long-1.png"/> ] --- ### Synthetic Null Mediators Here is the same principle applied to the mindfulness study. The synthetic null survey responses have been generated from mediation models with `\(\alpha = 0\)`. .center[ <img src="figures/mindfulness-altered.png" width=780/><br/> <span style="font-size: 24px;"> The middle panel comes from a synthetic null: `\(T \nrightarrow M \to Y\)`. </span> ] --- .pull-three-quarters-left[ <img src="figures/alteration_plot-1.png" width=720/> ] .pull-three-quarters-right[ These are analogous comparisons for the simulated microbiomes. Sampling from the fitted mediation models helps with model checking. ] --- ### Sensitivity Analysis Sensitivity analysis can show which conclusions might be changed if any identification assumptions are violated (listed in the appendix). For example, once we have fit mediation and outcome models, we can simulate according to: `\begin{align*} Y^*(t, m)=\hat{Y}(t, m)+\epsilon^y \\ M^*(t)=\hat{M}(t)+\epsilon^m . \end{align*}` A caveat is that this only makes sense for continuous outcomes and mediators. --- exclude: true ### Sensitivity Analysis We draw `\(\left(\epsilon^y, \epsilon^m\right)\)` from a Gaussian with mean zero and covariance, `\begin{align*} \Sigma(\rho, G):=\left(\begin{array}{cc} \operatorname{diag}\left(\hat{\sigma}_M^2\right) & \rho \hat{\sigma}_M \hat{\sigma}_Y^{\top} \odot \mathbf{1}_G \\ \rho \hat{\sigma}_Y \hat{\sigma}_M^{\top} \odot \mathbf{1}_G^{\top} & \operatorname{diag}\left(\hat{\sigma}_Y^2\right) \end{array}\right) \end{align*}` where `\(\mathbf{1}_{G} \in \{0, 1\}^{K \times J}\)` is an indicator over mediator-outcome pairs `\(G\)` over which to test sensitivity. --- .center[ ## Example ] --- ### Study Background 1. The study [13] carried out an integrative analysis to discover metabolite and taxonomic markers with diagnostic or therapeutic potential for IBD. 1. They gathered untargeted metabolomics + whole genome sequencing microbiome data on a cohort of 220 patients with the disease. 1. This resulted in 8.8K and 11.7K metabolite and genus features, respectively, all available on the [Curated Metabolome-Microbiome repository](https://github.com/borenstein-lab/microbiome-metabolome-curated-data/tree/main) [22]. --- ### Model Setup 1. We applied centered log-ratio and `\(\log\left(1 + x\right)\)` transformations to the microbiome and metabolome data, respectively. Both were then filtered to between 150 - 200 of the most abundant features. 1. We treat the microbiome as the mediators and the metabolome as the outcome, following the discussion from our quote above. 1. Taxonomic composition depends only on treatment status. Each metabolite's abundance is treated as a sparse linear function of a few microbes. .large-code[ ``` r model <- multimedia(exper, glmnet_model(lambda = 0.1)) ``` ] --- ### Direct Effects .pull-left[ These are the metabolites with the largest direct effects from our bootstrap analysis. Their variation in abundance can't be attributed to microbiome community changes. ] .pull-right[ <img src="figures/ibd_direct_effects.svg" width=500/> ] --- ### Indirect Effects This multidimensional scaling (MDS) is based on only microbiome data. Point size reflects metabolite abundances. Associations between metabolites and the MDS appear only for outcomes with large indirect effects. .center[ <img src="figures/indirect_effects_metabolites.png" width=820/> ] --- ### Pathwise Indirect Effects The top pathwise indirect effects are similar to the geometric interpretation we saw in the introduction! .center[ <img src="figures/indirect_effects_pathwise-metabolites.png" width=/> ] --- ### Sensitivity Analysis In this sensitivity analysis, we simulate unmeasured confounding between the abundances of the _Enterocloster_ genus (mediator) and the metabolites (outcomes) hydrocinnamici acid, lithocholate, and arginine. .center[ <img src="figures/indirect_effects_sensitivity.png"/> ] --- exclude: true ### Takeaways 1. Mediation analysis is a useful framework for relating environment, host, and microbiome features. More generally, counterfactual language can guide multi-omics data integration. 1. Modular software design can ensure that a few core statistical components can be applied to a wide range of study applications. --- ### Thank you! Paper: [go.wisc.edu/ebm917](https://go.wisc.edu/ebm917) Package: [go.wisc.edu/830110](https://go.wisc.edu/830110) (also on CRAN) * Contact: ksankaran@wisc.edu * Lab Members: Margaret Thairu, Shuchen Yan, Yuliang Peng, Langtian Ma, Helena Huang * Funding: NIGMS R01GM152744, NIAID R01AI184095, Gates 072185 --- class: reference ### References [1] R. Francescone et al. "Microbiome, Inflammation, and Cancer". In: _The Cancer Journal_ 20.3 (May. 2014), p. 181–189. ISSN: 1528-9117. DOI: [10.1097/ppo.0000000000000048](https://doi.org/10.1097%2Fppo.0000000000000048). URL: [http://dx.doi.org/10.1097/PPO.0000000000000048](http://dx.doi.org/10.1097/PPO.0000000000000048). [2] J. C. Arthur et al. "Intestinal Inflammation Targets Cancer-Inducing Activity of the Microbiota". In: _Science_ 338.6103 (Oct. 2012), p. 120–123. ISSN: 1095-9203. DOI: [10.1126/science.1224820](https://doi.org/10.1126%2Fscience.1224820). URL: [http://dx.doi.org/10.1126/science.1224820](http://dx.doi.org/10.1126/science.1224820). [3] Y. L. Lightfoot et al. "Tailoring gut immune responses with lipoteichoic acid-deficient Lactobacillus acidophilus". In: _Frontiers in Immunology_ 4 (2013). ISSN: 1664-3224. DOI: [10.3389/fimmu.2013.00025](https://doi.org/10.3389%2Ffimmu.2013.00025). URL: [http://dx.doi.org/10.3389/fimmu.2013.00025](http://dx.doi.org/10.3389/fimmu.2013.00025). [4] B. Jia et al. "Gut microbiome-mediated mechanisms for reducing cholesterol levels: implications for ameliorating cardiovascular disease". En. In: _Trends Microbiol._ 31.1 (Jan. 2023), pp. 76-91. [5] H. Plovier et al. "A purified membrane protein from Akkermansia muciniphila or the pasteurized bacterium improves metabolism in obese and diabetic mice". In: _Nature Medicine_ 23.1 (Nov. 2016), p. 107–113. ISSN: 1546-170X. DOI: [10.1038/nm.4236](https://doi.org/10.1038%2Fnm.4236). URL: [http://dx.doi.org/10.1038/nm.4236](http://dx.doi.org/10.1038/nm.4236). [6] V. Celli. "Causal mediation analysis in economics: Objectives, assumptions, models". In: _Journal of Economic Surveys_ 36.1 (Jul. 2021), p. 214–234. ISSN: 1467-6419. DOI: [10.1111/joes.12452](https://doi.org/10.1111%2Fjoes.12452). URL: [http://dx.doi.org/10.1111/joes.12452](http://dx.doi.org/10.1111/joes.12452). [7] L. Richiardi et al. "Mediation analysis in epidemiology: methods, interpretation and bias". In: _International journal of epidemiology_ 42.5 (2013), pp. 1511-1519. [8] M. B. Sohn et al. "Compositional mediation analysis for microbiome studies". In: _The Annals of Applied Statistics_ 13.1 (2019), pp. 661-681. [9] C. Wang et al. "Estimating and testing the microbial causal mediation effect with high-dimensional and compositional microbiome data". In: _Bioinformatics_ 36.2 (Jul. 2019). Ed. by I. Birol, p. 347–355. ISSN: 1367-4811. DOI: [10.1093/bioinformatics/btz565](https://doi.org/10.1093%2Fbioinformatics%2Fbtz565). URL: [http://dx.doi.org/10.1093/bioinformatics/btz565](http://dx.doi.org/10.1093/bioinformatics/btz565). [10] K. M. Carter et al. "An Information-Based Approach for Mediation Analysis on High-Dimensional Metagenomic Data". In: _Frontiers in Genetics_ 11 (Mar. 2020). ISSN: 1664-8021. DOI: [10.3389/fgene.2020.00148](https://doi.org/10.3389%2Ffgene.2020.00148). URL: [http://dx.doi.org/10.3389/fgene.2020.00148](http://dx.doi.org/10.3389/fgene.2020.00148). [11] K. Imai et al. "A general approach to causal mediation analysis." In: _Psychological methods_ 15.4 (2010), p. 309. [12] K. Imai et al. "Causal mediation analysis using R". In: _Advances in social science research using R_. Springer, 2010, pp. 129-154. --- class: reference ### References [13] E. A. Franzosa et al. "Gut microbiome structure and metabolic activity in inflammatory bowel disease". In: _Nature Microbiology_ 4.2 (Dec. 2018), p. 293–305. ISSN: 2058-5276. DOI: [10.1038/s41564-018-0306-4](https://doi.org/10.1038%2Fs41564-018-0306-4). URL: [http://dx.doi.org/10.1038/s41564-018-0306-4](http://dx.doi.org/10.1038/s41564-018-0306-4). [14] J. Atchison et al. "Logistic-normal distributions: Some properties and uses". In: _Biometrika_ 67.2 (1980), pp. 261-272. [15] F. Xia et al. "A logistic normal multinomial regression model for microbiome compositional data analysis". In: _Biometrics_ 69.4 (2013), pp. 1053-1063. [16] R. Tibshirani. "Regression shrinkage and selection via the lasso". In: _Journal of the Royal Statistical Society Series B: Statistical Methodology_ 58.1 (1996), pp. 267-288. [17] K. Sankaran. _miniLNM: Miniature Logistic-Normal Multinomial Models_. Sep. 2024. DOI: [10.32614/cran.package.minilnm](https://doi.org/10.32614%2Fcran.package.minilnm). URL: [http://dx.doi.org/10.32614/CRAN.package.miniLNM](http://dx.doi.org/10.32614/CRAN.package.miniLNM). [18] J. Friedman et al. "Regularization Paths for Generalized Linear Models via Coordinate Descent". In: _Journal of Statistical Software_ 33.1 (2010), pp. 1-22. DOI: [10.18637/jss.v033.i01](https://doi.org/10.18637%2Fjss.v033.i01). [19] M. N. Wright et al. "ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R". In: _Journal of Statistical Software_ 77.1 (2017), pp. 1-17. DOI: [10.18637/jss.v077.i01](https://doi.org/10.18637%2Fjss.v077.i01). [20] P. Bürkner. "brms: An R Package for Bayesian Multilevel Models Using Stan". In: _Journal of Statistical Software_ 80.1 (2017), pp. 1-28. DOI: [10.18637/jss.v080.i01](https://doi.org/10.18637%2Fjss.v080.i01). [21] H. Wickham et al. "Welcome to the Tidyverse". In: _Journal of open source software_ 4.43 (2019), p. 1686. [22] E. Muller et al. "The gut microbiome-metabolome dataset collection: a curated resource for integrative meta-analysis". In: _npj Biofilms and Microbiomes_ 8.1 (Oct. 2022). ISSN: 2055-5008. DOI: [10.1038/s41522-022-00345-5](https://doi.org/10.1038%2Fs41522-022-00345-5). URL: [http://dx.doi.org/10.1038/s41522-022-00345-5](http://dx.doi.org/10.1038/s41522-022-00345-5). [23] K. Imai et al. "Identification and Sensitivity Analysis for Multiple Causal Mechanisms: Revisiting Evidence from Framing Experiments". In: _Political Analysis_ 21.2 (2013), p. 141–171. ISSN: 1476-4989. DOI: [10.1093/pan/mps040](https://doi.org/10.1093%2Fpan%2Fmps040). URL: [http://dx.doi.org/10.1093/pan/mps040](http://dx.doi.org/10.1093/pan/mps040). [24] A. Jérolon et al. "Causal mediation analysis in presence of multiple mediators uncausally related". In: _The International Journal of Biostatistics_ 17.2 (2021), pp. 191-221. --- ### Example Model: Logistic-Normal Multinomial The logistic-normal multinomial (LNM) model has the form: .pull-left[ `\begin{align*} y_{i} \sim \Mult\left(N_{i}, \varphi^{-1}\left(z_{i}^{T}\beta\right)\right) \\ \beta \sim \Gsn\left(0, \diag\left(\sigma_{k}^{2}\right)\right) \end{align*}` where `\(\varphi^{-1}\left(z\right) \propto\left(\exp{z_{1}}, \dots, \exp{z_{K-1}}, 1\right)\)` ] .pull-right[ <img src="figures/lnm.svg" style="display: block; margin: auto;" /> ] --- ### Identification of Overall Direct/Indirect Effects These sequential ignorability assumptions are sufficient for identification of overall direct and indirect effects [11]. For any `\(t, t', x\)`, we require, 1. Treatment ignorability: `\(\left\{Y\left(t^{\prime}, m\right), M(t)\right\} \perp T \mid X=x\)` 1. Mediator ignorability: `\(Y\left(t^{\prime}, m\right) \perp M(t) \mid T=t, X=x\)` 1. Positivity: `\(\mathbb{P}(T=t \mid X=x)>0\)` 1. Positivity for mediator: `\(p_{M(t)}(m \mid T=t, X=x)>0\)` --- ### Identification of Pathwise Indirect Effects Pathwise indirect effects require a generalization of sequential ignorability assumptions [23; 24]. For any `\(t, t', t'', m, x, w\)` we require, 1. Treatment ignorability: `\(\left\{Y(t, m, w), M_k\left(t^{\prime}\right), M_{-k}\left(t^{\prime \prime}\right)\right\} \perp T \mid X=x\)` 1. Mediator ignorability: `\(Y\left(t^{\prime}, m, M_{-k}\left(t^{\prime}\right)\right) \perp M_k \mid T=t, X=x\)` 1. Mediator `\(k\)` ignorability: `\(Y\left(t^{\prime}, M_k\left(t^{\prime}\right), w\right) \perp M_{-k} \mid T=t, X=x\)` 1. Positivity: `\(\mathbb{P}(T=t \mid X=x)>0\)` 1. Mediator positivity: `\(p_{\left(M_k t, M_{-k}(t)\right)}(m, w \mid T=t, X=x)>0\)` --- ### Alternative: Hurdle Model The interface makes it easy to swap in new models. For example, here we replaced the lasso regression on log-normalized metabolite abundances with a hurdle model that directly models zero-inflated nonnegative data. .large-code[ ``` r model <- multimedia( exper2, # version without log transformation brms_model(family = hurdle_lognormal()) # new outcome model ) |> estimate(exper2) ``` ] --- ### Alternative: Hurdle Model .center[ <img src="figures/hurdle_model_indirect.png" width=700/> ] --- ### Identification <img src="figures/uncausally-correlated.png" width=800/>