Enhancing Microbiome Analysis with Semisynthetic Data

# Enhancing Microbiome Analysis with Semisynthetic Data

<div id="subtitle_left">
Slides: <a href="https://go.wisc.edu/z3tx91">go.wisc.edu/z3tx91</a><br/>
Paper: <a href="https://go.wisc.edu/p12o8w">go.wisc.edu/p12o8w</a><br/>
Lab: <a href="https://measurement-and-microbes.org">measurement-and-microbes.org</a> <br/>
</div>
<div id="subtitle_right">
Kris Sankaran <br/>
<a href="https://sites.google.com/view/wiscmllm/home">Machine Learning Lunch Meetings</a><br/>
15 | April | 2025 <br/>
</div>

---

### Why Simulate?

There are myriad opportunities for using simulation in microbiome analysis [1; 2]. They
can help us to...

<img src="figures/noun-benchmark-7569457.png" width=40/> **Benchmark methods** and identify gaps in the literature.
<br/>

<img src="figures/noun-labs-99456.png" width=40/> **Design experiments** that have high power to detect subtle signals.
<br/>

<img src="figures/noun-checkmark-7518321.png" width=40/> **Check conclusions** that might be sensitive to technical processing steps.

---

### Semisynthetic Data

One of the major advances has been the design of algorithms that can leverage
public data resources, like [3; 4; 5; 6].

* **Semisynthetic Data**: The output from a simulator that has been designed to mimic external, template data. 
* **Template Data**: Previously gathered experimental data that can be used to train a simulator.

---

---

How would you run a power analysis for Sparse Partial Least Squares Discriminant
Analysis (SPLS-DA) [7]?
.pull-left[
SPLS-DA helps with prediction when,

* S: Not all features are predictive
* PLS: Many features are correlated with one another
* DA: The response is one of `$K$` classes

Unfortunately, it doesn't come with any analytical power formulas.
]

---

### Example Output

In this example, we are comparing mice with and without a mouse model of Type I
diabetes (T1D). SPLS-DA helps us find taxa that distinguish healthy and disease
groups.

.pull-left[
<img src="figures/t1d-true-data.png"/>
]
.pull-right[
<img src="figures/t1d-true-data-factors.png"/>
]

---

### Overall Approach

How many samples are necessary before this method can recover the
discriminating factors?

* **Estimate**: Train a simulator on the original data.
* **Alter/Sample**: Define negative control taxa with no association with T1D.
* **Gather/Summarize**: Evaluate SPLS-DA performance on semisynthetic data with
varying sample sizes and fractions of negative control taxa.

---

### Copula Simulation

Here, we used a multivariate Gaussian model. More generally, we have found copula models useful . In both cases, it helps to apply
a high-dimensional covariance estimator.

---

### Bivariate Relationships

Here are example bivariate relationships learned by the simulator.

---

### Power Curves

These are the results of our simulation experiment across varying sample sizes
and proportions of truly associated taxa. When few taxa are truly predictive,
many more samples are needed.

---

---

### Motivating Study

Working with microbiologists and psychologists at UW-Madison, we re-analyzed a
dataset about the gut-brain axis.

.pull-left[
1. We re-analyzed the study 2021 [8], which
gathered data from 54 participants assigned to either a mindfulness training
intervention or a waitlist control (n = 27 each).

1. The training lasted 2 months. Data were collected at the start, finish, and 2
month follow-up.
]

---

### Mediation Analysis

1. We wondered whether mindfulness intervention might be affect behavior, which
in turn influences microbiota composition.
2. To explore this, we applied a form of mediation analysis to the 16S
microbiome and survey data [9; 10].

---

### Mediation Analysis

---

### Synthetic Null Data

We can alter the simulator so that some pathways are "turned off." Estimates
derived from these data provide a reference null distribution

.center[
<img src="figures/mindfulness-altered.png" width=780/><br/>
<span style="font-size: 24px;">
The middle panel comes from a synthetic null: `$T \nrightarrow M \to Y$`.
</span>
]

---

### Synthetic Null Hypothesis Testing

We rank the effects learned from both the real and synthetic null reference
data. The significance threshold is chosen to control the empirical false
discovery based on synthetic data.

---

---

### Reliability Checks

1. Beyond power and benchmarking analysis, simulations can clarify how to
interpret a complicated workflow.

1. Following the lead of 
[11; 12], we have been
calling this a *reliability check*.  These checks construct hypothetical
scenarios to understand how methods behave.

<div style="margin-left: 100px;">
<span style="font-family: 'Exo 2'; font-size: 18;">
The analysis should not...<br/>
&nbsp;&nbsp;&nbsp;&nbsp;introduce spurious signals.<br/>
&nbsp;&nbsp;&nbsp;&nbsp;give high confidence results on uncertain data.<br/>
&nbsp;&nbsp;&nbsp;&nbsp;yield very different answers on similar datasets.<br/>
&nbsp;&nbsp;&nbsp;&nbsp;drown out subtle effects.<br/>
&nbsp;&nbsp;&nbsp;&nbsp;etc...
</span>
</div>

---

### Vertical Data Integration

To illustrate, let's consider a vertical data integration question 
[13]. These are problems where we get complementary
'omics views of the same samples.

The goal is to prepare a unified analysis which considers relationships across
sources.

---

### ICU Example

.pull-left[
The study [14] used amplicon sequencing data to profile
the bacterial, viral, and fungal composition in the gut microbiome samples from
ICU patients at a hospital, including a subset who were experiencing sepsis.
]

---

### Multiblock SPLS-DA Analysis

Multiblock SPLS-DA generalizes SPLS-DA to incorporate measurements across
multiple tables [15]. With 
`$\texttt{sepsis} \times \texttt{antibiotics}$` status as the response
variable, the method outputs the plots below.

---

### Reliability Check

It's not obvious how we should interpret this output. For example, the virus
data must influence the bacteria plot, because the method integrated across
sources, but how strong is the influence?

Some integration methods are more vs. less aggressive than others.

---

### Semisynthetic Data

To calibrate our interpretation, we first fit a simulator using all data. We
then deliberately remove all associations between the bacteria community
profiles and sepsis status.

### Simulation Results

Applying Multiblock SPLS-DA to these data suggests that we are in an "aggressive
integration" regime. 
.center[<img src="figures/multiblock_calibration.png" width=780/>]
A reliability check like this might have helped [16]
realize that their normalization procedure introduced spurious associations.

---

---

### Evaluation Taxonomy

To be useful, simulated data need to be realistic. A few differences to be aware of:

* **Narrow/Broad Measures**: Narrow measures focus on small subsets of taxa, while broad measures evaluate community-level properties.

* **Graphical/Quantative**: Some checks are more easily quantifiable.

* **Fit-for-purpose measures**: Evaluation can focus on specific parameter estimates or analysis results.

Different types of realism should have higher priority depending on the
downstream tasks.

---

### Evaluation through Classification

What type of model would you use to simulate data like this?

---

* A natural enough starting point is a Gaussian mixture model with `$K = 4$`.
* We can simulate from the fit, but it seems quite far off.
.pull-left[
_Simulated_
<img src="figures/Gaussian (Shared Covarince).png" width="480"/>
]
.pull-right[
_Truth_
<img src="figures/true_mixture.png" width="480"/>
]

---

We make our assessment quantitative using the discriminator idea of [17].

The prediction probabilies below come from a gradient boosting model. Its 
out-of-sample accuracy is 65.5%.
.pull-left[
_Simulated_
<img src="figures/Gaussian (Shared Covarince)-prob.png" width="480"/>
]
.pull-right[
_Truth_
<img src="figures/true-Gaussian (Shared Covariance)-prob.png" width="480"/>
]

---

As a next step, we increase number of components to `$K = 5$` and fit different variances per component.

We still over-sample the gap between the two bottom-left clusters, but the GBM
accuracy has dropped to 55.5%.
.pull-left[
_Simulated_<br/>
<img src="figures/Gaussian (Individual Covariance)-prob.png" width="440"/>
]
.pull-right[
_Truth_<br/>
<img src="figures/true-Gaussian (Individual Covariance)-prob.png" width="440"/>
]

---

* We use a mixture of `$t$` distributions next.
* GBM accuracy is now 50.6%
  - Unsurprisingly, this is the true mechanism that generated the data.

<img src="figures/Student's t (Individual Covariance)-prob.png" width="500"/>
]
.pull-right[
_Truth_

<img src="figures/true-Student's t (Individual Covariance)-prob.png" width="500"/>
]

---

The discrimination probabilities become closer to 0.5 the more accurate the simulation becomes.

---

---

### Ergonomic Simulation Software

We can break the simulation interface into two parts.

1. **Data Structures (Nouns)**: A good representation makes the simulation
components transparent without causing cognitive overload.

1. **Operations (Verbs)**: What do we do with the structure? E.g., "estimate",
"sample", "print", "plot", "add nulls", "increase signal", "join", ...

If the resulting grammar is expressive enough, then researchers will be able to
solve problems we may not have anticipated.

---

### Verbs: <span style="color:#025E73">Mutate</span>

1. `mutate` lets you modify a few elements from a larger simulator.
2. We can use `mutate` to define a synthetic null with no disease effect for a known subset of genes.

.pull-three-quarters-left[
<img src="figures/nulls_unaltered.png"/>
]
.pull-three-quarters-right[
<img src="figures/pairwise_cors.png"/>
]

---

### Verbs: <span style="color:#025E73">Mutate</span>

1. `mutate` lets you modify a few elements from a larger simulator.
2. We can use `mutate` to define a synthetic null with no disease effect for a known subset of genes.

.pull-three-quarters-left[
<img src="figures/altered_ns.png"/>
]
.pull-three-quarters-right[
<img src="figures/pairwise_cors_altered.png"/>
]

---

### Verbs: <span style="color:#025E73">Join</span>

We should make it possible to combine simulators like Lego blocks.

``` r
experiments <- list(methylation = SCGEMMETH_sce, rna = SCGEMRNA_sce)
families <- list(~ BI(), ~ GaussianLSS())
sims <- experiments |>
  map2(families, \(x, y) setup_simulator(x, ~ cell_type, y))
```

---

### Verbs: <span style="color:#025E73">Join</span> (Copula)

One approach is to merge the list of marginal distributions and re-estimate the joint distribution.

``` r
sim_joined <- map(sims, estimate, nu = 0.1) |>
  join_copula(copula_glasso())
```

This assumes that we have samples where all features are measured.

---

### Verbs: <span style="color:#025E73">Join</span> (Conditioning)

Alternatively, we can combine two simulators by conditioning them on shared latent structure.

``` r
sim_joined <- join_pamona(sims)
```

---

### Verbs: <span style="color:#025E73">Join</span> (Conditioning)

This used partial manifold alignment  to
learn shared latent variables across assays and works even in the diagonal
integration setting.

``` r
sim_joined
```

```
## $methylation
## [Marginals]
## Plan:
## # A tibble: 6 × 3
##     feature  family                       link
##   <gene_id> <distn>                     <link>
## 1  AK123759 BI [mu] ~cell_type + UMAP1 + UMAP2
## 2    ADAM33 BI [mu] ~cell_type + UMAP1 + UMAP2
## 3      NFIX BI [mu] ~cell_type + UMAP1 + UMAP2
## 4     FOXD2 BI [mu] ~cell_type + UMAP1 + UMAP2
## 5   HLX.AS1 BI [mu] ~cell_type + UMAP1 + UMAP2
## 6  LY86.AS1 BI [mu] ~cell_type + UMAP1 + UMAP2
## AK123759, ADAM33, NFIX, and 24 other features need fitting.
## Estimates:
## # A tibble: 0 × 0
## 
## [Dependence]
## 0 NULLs with  features
## 
## [Template Data]
## class: SingleCellExperiment 
## dim: 27 142 
## metadata(0):
## assays(1): counts
## rownames(27): AK123759 ADAM33 ... KCNQ2 CDH22
## rowData names(0):
## colnames(142): CellMeth1 CellMeth2 ... CellMeth141 CellMeth142
## colData names(10): X1 X2 ... UMAP1 UMAP2
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## 
## $rna
## [Marginals]
## Plan:
## # A tibble: 6 × 3
##     feature              family                       link
##   <gene_id>             <distn>                     <link>
## 1      ZIC3 Gaussian [mu,sigma] ~cell_type + UMAP1 + UMAP2
## 2     KCNQ2 Gaussian [mu,sigma] ~cell_type + UMAP1 + UMAP2
## 3     ZFP42 Gaussian [mu,sigma] ~cell_type + UMAP1 + UMAP2
## 4      OTX2 Gaussian [mu,sigma] ~cell_type + UMAP1 + UMAP2
## 5     SALL4 Gaussian [mu,sigma] ~cell_type + UMAP1 + UMAP2
## 6     NANOG Gaussian [mu,sigma] ~cell_type + UMAP1 + UMAP2
## ZIC3, KCNQ2, ZFP42, and 29 other features need fitting.
## Estimates:
## # A tibble: 0 × 0
## 
## [Dependence]
## 0 NULLs with  features
## 
## [Template Data]
## class: SingleCellExperiment 
## dim: 32 177 
## metadata(0):
## assays(1): logcounts
## rownames(32): ZIC3 KCNQ2 ... NESTIN MYC
## rowData names(0):
## colnames(177): CellRNA1 CellRNA2 ... CellRNA176 CellRNA177
## colData names(10): X1 X2 ... UMAP1 UMAP2
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
```

---

### Additional Resources

All the examples I discussed today can be run from online tutorials we've
written to accompany our papers:

* Simulation for Microbiome Analysis ([go.wisc.edu/wnj5p9](https://go.wisc.edu/wnj5p9))
* Generative Models Examples ([go.wisc.edu/ax73qb](https://go.wisc.edu/ax73qb))

The relevant R packages behind these analyses are:

* `multimedia` - Mediation analysis for microbiome data [18].
* `scDesign3` - An existing simulator for single cell data [19; 20; 21].
* `scDesigner` - Under-development version used in the first tutorial.

---

Simulation turns abstract, conceptual questions into simple empirical ones.

---

* Contact: ksankaran@wisc.edu
* Lab Members: Margaret Thairu, Shuchen Yan, Yuliang Peng, Helena Huang
* Funding: NIGMS R01GM152744, NIAID R01AI184095
* Co-authors: Hanying Jiang, Xinran Miao, Mara Beebe, Dan W. Grupe, Richie
Davidson, Jo Handelsman, Saritha Kodikara, Jingyi Jessica Li, Kim-Anh Lê Cao,
Susan Holmes

---

### References

[1] K. Sankaran et al. "Generative Models: An Interdisciplinary Perspective". In:
_Annual Review of Statistics and Its Application_ 10.1 (Mar. 2023), p. 325–352.
ISSN: 2326-831X. DOI:
[10.1146/annurev-statistics-033121-110134](https://doi.org/10.1146%2Fannurev-statistics-033121-110134).
URL:
[http://dx.doi.org/10.1146/annurev-statistics-033121-110134](http://dx.doi.org/10.1146/annurev-statistics-033121-110134).

[2] K. Sankaran et al. "Semisynthetic simulation for microbiome data analysis".
En. In: _Brief. Bioinform._ 26.1 (Nov. 2024).

[3] E. Pasolli et al. "Accessible, curated metagenomic data through
ExperimentHub". In: _Nature Methods_ 14 (2017), pp. 1023-1024. URL:
[https://api.semanticscholar.org/CorpusID:3403081](https://api.semanticscholar.org/CorpusID:3403081).

[4] E. Muller et al. "The gut microbiome-metabolome dataset collection: a curated
resource for integrative meta-analysis". In: _npj Biofilms and Microbiomes_ 8.1
(Oct. 2022). ISSN: 2055-5008. DOI:
[10.1038/s41522-022-00345-5](https://doi.org/10.1038%2Fs41522-022-00345-5). URL:
[http://dx.doi.org/10.1038/s41522-022-00345-5](http://dx.doi.org/10.1038/s41522-022-00345-5).

[5] Felix G.M. Ernst <felix.gm.ernst@outlook.com> [aut, cre]
(<https://orcid.org/0000-0001-5064-0928>), Leo Lahti [aut]
(<https://orcid.org/0000-0001-5537-637X>), Sudarshan Shetty
<sudarshanshetty9@gmail.com> [aut] (<https://orcid.org/0000-0001-7280-9915>).
_microbiomeDataSets_. 2021.

[6] _Home - National Microbiome Data Collaborative - microbiomedata.org_.
<https://microbiomedata.org/>. [Accessed 17-02-2025].

[7] F. Rohart et al. "mixOmics: An R package for 'omics feature selection and
multiple data integration". En. In: _PLoS Comput. Biol._ 13.11 (Nov. 2017), p.
e1005752.

[8] D. W. Grupe et al. "The Impact of Mindfulness Training on Police Officer
Stress, Mental Health, and Salivary Cortisol Levels". In: _Frontiers in
Psychology_ 12 (Sep. 2021). ISSN: 1664-1078. DOI:
[10.3389/fpsyg.2021.720753](https://doi.org/10.3389%2Ffpsyg.2021.720753). URL:
[http://dx.doi.org/10.3389/fpsyg.2021.720753](http://dx.doi.org/10.3389/fpsyg.2021.720753).

[9] K. Imai et al. "A general approach to causal mediation analysis." In:
_Psychological methods_ 15.4 (2010), p. 309.

[10] M. B. Sohn et al. "Compositional mediation analysis for microbiome studies".
In: _The Annals of Applied Statistics_ 13.1 (2019), pp. 661-681.

[11] D. Song et al. "PseudotimeDE: inference of differential gene expression along
cell pseudotime with well-calibrated p-values from single-cell RNA sequencing
data". In: _Genome biology_ 22.1 (2021), p. 124.

[12] D. Song. "Improving Statistical Rigor in Single-Cell and Spatial Omics". PhD
thesis. University of California, Los Angeles, 2024.

[13] K. Lê Cao et al. "Community-wide hackathons to identify central themes in
single-cell multi-omics". In: _Genome biology_ 22 (2021), pp. 1-21.

---

### References

[14] B. W. Haak et al. "Integrative transkingdom analysis of the gut microbiome in
antibiotic perturbation and critical illness". En. In: _mSystems_ 6.2 (Mar. 2021).

[15] A. Singh et al. "DIABLO: an integrative approach for identifying key
molecular drivers from multi-omics assays". In: _Bioinformatics_ 35.17 (Jan.
2019). Ed. by I. Birol, p. 3055–3062. ISSN: 1367-4811. DOI:
[10.1093/bioinformatics/bty1054](https://doi.org/10.1093%2Fbioinformatics%2Fbty1054).
URL:
[http://dx.doi.org/10.1093/bioinformatics/bty1054](http://dx.doi.org/10.1093/bioinformatics/bty1054).

[16] G. D. Poore et al. "RETRACTED ARTICLE: Microbiome analyses of blood and
tissues suggest cancer diagnostic approach". In: _Nature_ 579.7800 (Mar. 2020), p.
567–574. ISSN: 1476-4687. DOI:
[10.1038/s41586-020-2095-1](https://doi.org/10.1038%2Fs41586-020-2095-1). URL:
[http://dx.doi.org/10.1038/s41586-020-2095-1](http://dx.doi.org/10.1038/s41586-020-2095-1).

[17] J. Friedman. _On multivariate goodness-of-fit and two-sample testing_. Tech.
rep. Citeseer, 2004.

[18] H. Jiang et al. "Multimedia: multimodal mediation analysis of microbiome
data". In: _Microbiology Spectrum_ 13.2 (Feb. 2025). Ed. by J. Claesen. ISSN:
2165-0497. DOI:
[10.1128/spectrum.01131-24](https://doi.org/10.1128%2Fspectrum.01131-24). URL:
[http://dx.doi.org/10.1128/spectrum.01131-24](http://dx.doi.org/10.1128/spectrum.01131-24).

[19] W. V. Li et al. "A statistical simulator scDesign for rational scRNA-seq
experimental design". In: _Bioinformatics_ 35.14 (Jul. 2019), p. i41–i50. ISSN:
1367-4811. DOI:
[10.1093/bioinformatics/btz321](https://doi.org/10.1093%2Fbioinformatics%2Fbtz321).
URL:
[http://dx.doi.org/10.1093/bioinformatics/btz321](http://dx.doi.org/10.1093/bioinformatics/btz321).

[20] T. Sun et al. "scDesign2: a transparent simulator that generates
high-fidelity single-cell gene expression count data with gene correlations
captured". In: _Genome Biology_ 22.1 (May. 2021). ISSN: 1474-760X. DOI:
[10.1186/s13059-021-02367-2](https://doi.org/10.1186%2Fs13059-021-02367-2). URL:
[http://dx.doi.org/10.1186/s13059-021-02367-2](http://dx.doi.org/10.1186/s13059-021-02367-2).

[21] D. Song et al. "scDesign3 generates realistic in silico data for multimodal
single-cell and spatial omics". In: _Nature Biotechnology_ 42.2 (May. 2023), p.
247–252. ISSN: 1546-1696. DOI:
[10.1038/s41587-023-01772-1](https://doi.org/10.1038%2Fs41587-023-01772-1). URL:
[http://dx.doi.org/10.1038/s41587-023-01772-1](http://dx.doi.org/10.1038/s41587-023-01772-1).

---

### SPLS-DA Intuition

We "blend" columns of `$\mathbf{X}$` and `$\mathbf{Y}$` within tables until the patterns look similar.

Roughly, choose weights `$\mathbf{a}$` and `$\mathbf{b}$` to maximize
`$\text{cor}\left(\mathbf{Xa}, \mathbf{Yb}\right)$`.

---

### SPLS-DA Intuition

We "blend" columns of `$\mathbf{X}$` and `$\mathbf{Y}$` within tables until the patterns look similar.

Roughly, choose weights `$\mathbf{a}$` and `$\mathbf{b}$` to maximize
`$\text{cor}\left(\mathbf{Xa}, \mathbf{Yb}\right)$`.

---

### SPLS-DA Intuition

We "blend" columns of `$\mathbf{X}$` and `$\mathbf{Y}$` within tables until the patterns look similar.

Roughly, choose weights `$\mathbf{a}$` and `$\mathbf{b}$` to maximize
`$\text{cor}\left(\mathbf{Xa}, \mathbf{Yb}\right)$`.

---

### SPLS-DA Intuition

We "blend" columns of `$\mathbf{X}$` and `$\mathbf{Y}$` within tables until the patterns look similar.

Roughly, choose weights `$\mathbf{a}$` and `$\mathbf{b}$` to maximize
`$\text{cor}\left(\mathbf{Xa}, \mathbf{Yb}\right)$`.

---

### SPLS-DA Intuition

Now we can compare samples from the two tables in a single, shared space.

---

### SPLS-DA Intuition

Now we can compare samples from the two tables in a single, shared space.

---

### SPLS-DA Intuition

To get more than one dimension, we can repeat this process after removing any
correlation with previously found patterns.

---

### Copula Models

More formally, let `$F_{1}, \dots, F_{D}$` be the target margins and let `$\Phi$` be
the CDF of the Gaussian distribution. Gaussian Copula modeling has these steps.

Estimate:

1. Gaussianize the observed `$\mathbf{x}_{i}$` to `$\mathbf{z}_{i} := \left[\Phi^{-1}\left(F_{1}\left(x_{i1}\right)\right), \dots, \Phi^{-1}\left(F_{D}\left(x_{iD}\right)\right)\right]$`
1. Estimate the covariance `$\hat{\Sigma}$` associated with `$z_{i}$`

Simulate:

1. Draw `$\mathbf{z}^\ast \sim \mathcal{N}\left(0, \Sigma\right)$` 
1. Transform back `$\mathbf{x}^{\ast} := \left[F_{1}^{-1}\left(\Phi\left(z_{i1}^\ast\right)\right), \dots, F_{D}^{-1}\left(\Phi\left(z_{iD}^\ast\right)\right)\right]$`

---

### Real vs. Simulated Correlation

A detailed explanation is given [here](https://krisrs1128.github.io/microbiome-simulation/multivariate-power-analysis.html#evaluation).

---

### Tuning High-Dimensional Covariance Estimator

A detailed explanation is given [here](https://krisrs1128.github.io/microbiome-simulation/multivariate-power-analysis.html#evaluation).

---

### Intuition

* In the Gaussianized space, it's easy to model correlation.
* The mapping back and forth is possible because we know the margins `$F$`.
  - `$\Phi$` represents the Gaussian CDF applied componentwise
<br/>
<br/>

---

### Pilot Study

.pull-left[
1. We re-analyzed a pilot study from 2021 [8], which
gathered data from 54 participants randomly assigned to either a mindfulness
training intervention or a waitlist control (n = 27 each).

1. The training lasted 2 months. Data were collected at the start, finish, and 2
month follow-up.
]

---

### Estimated Indirect Effects

These figures summarize the paths `$T \to M \to Y$`.</br>
(i.e., color `$\to$` x-axis `$\to$` y-axis).

---

### Figure Sources

frustration by Rikas Dzihab from <a href="https://thenounproject.com/browse/icons/term/frustration/" target="_blank" title="frustration Icons">Noun Project</a> (CC BY 3.0)

confused by Rikas Dzihab from <a href="https://thenounproject.com/browse/icons/term/confused/" target="_blank" title="confused Icons">Noun Project</a> (CC BY 3.0)

Benchmark by Sofiah from <a href="https://thenounproject.com/browse/icons/term/benchmark/" target="_blank" title="Benchmark Icons">Noun Project</a> (CC BY 3.0)

checkmark by Asa Kharisma Dini from <a href="https://thenounproject.com/browse/icons/term/checkmark/" target="_blank" title="checkmark Icons">Noun Project</a> (CC BY 3.0)

Lab glassware by Vectors Market from <a href="https://thenounproject.com/browse/icons/term/lab-glassware/" target="_blank" title="Lab glassware Icons">Noun Project</a> (CC BY 3.0)