class: title # Multiscale Topic Modeling with Alto <div id="subtitle_left"> Slides: <a href="https://go.wisc.edu/54f161">go.wisc.edu/54f161</a><br/> Paper: <a href="https://go.wisc.edu/tify36">go.wisc.edu/tify36</a><br/> Lab: <a href="https://measurement-and-microbes.org">measurement-and-microbes.org</a> <br/> </div> <div id="subtitle_right"> Kris Sankaran <br/> CGSI Virtual Class<br/> 18 | April | 2025 <br/> </div> --- ### Estimating Weights The weights `\(W\)` can be estimated by solving the optimal transport problem, `\begin{align*} &\min_{W \in \mathcal{U}\left(p, q\right)} \left<C,W\right> \end{align*}` <span style="font-size: 20px;"> `\begin{align*} \mathcal{U}\left(p, q\right) := &\{W\in \mathbb{R}^{\left|V_{p}\right| \times \left|V_{q}\right|}_{+} : W 1_{\left|V_{q}\right|} = p \text{ and } W^{T} 1_{\left|V_{p}\right|} = q\}. \end{align*}` </span> <img src="figures/transport_alignment_conceptual.png" width="420" style="display: block; margin: auto;" /> --- ### Indication of Consistency The diagnostics become more reliable as the sample size increases. <img src="figures/summary_alto_asymptotic_behavior.png" width="850" style="display: block; margin: auto;" /> --- ### Extensions The `alto` package is focused on topic models. However, it is possible to generalize to nonnegative matrix factorization. * We can fit `\(X \approx W\left(m\right)H\left(m\right)^\top\)` for a sequence of `\(m\)`. The scores `\(w_{k}\left(m\right)\)` are highly correlated across adjacent `\(m\)`. * To build an alignment diagram, we only require that the alignment scores satisfy constraints on row and column totals. This can be achieved using iterative proportional fitting. --- <img src="figures/nmf_result.png" width="850" style="display: block; margin: auto;" /> --- ### Code Demo Follow along: https://go.wisc.edu/d26lb4