Distortions
The distortions
package gives functions to compute and visualize the
distortions that are introduced by nonlinear dimensionality reduction
algorithms. It is designed to wrap arbitrary embedding methods and builds on the
distortion estimation routines from the megaman
package. The resulting visualizations can let
you interactively query properties that are not well preserved by the embedding.
For example, the image below shows us selecting distances that are larger in the
embedding space compared to the original data space. This allows us to describe
the large-scale distortions induced by the embedding. For example, we can that
some of the T cells have many neighbors with monocytes (to run this yourself,
see the PBMC Atlas
article).
Alternatively, we can study how the embedding warps distances more locally. Each ellipse in the figure below represents the way in which distances in the original data manifold are warped. By hovering over different regions of the map, we invert the warping in the region surrounding the mouse. For example, this shows that the T cells near the top and bottom of the T cell cluster are in fact more distant from each othoer than the static embedding would suggest.
Quickstart
You can install the package using:
Here's a small example on a UMAP applied to a simulated AnnData
object. First
we generate some random data and embeddings.
import anndata as ad
import scanpy as sc
import numpy as np
adata = ad.AnnData(np.random.poisson(2, size=(100, 5)))
sc.pp.neighbors(adata, n_neighbors=15)
sc.tl.umap(adata)
Next we estimate the local distortions and bind the relevant ellipse information
to our embeddings. The Geometry
object comes from the megaman
package and
gives ways of representing the intrinsic geometry of a manifold.
from distortions.geometry import Geometry, bind_metric, local_distortions, neighborhoods
geom = Geometry(affinity_kwds={"radius": 2}, adjacency_kwds={"n_neighbors": 15})
_, Hvv, Hs = local_distortions(adata.obsm["X_umap"], adata.X, geom)
embedding = bind_metric(adata.obsm["X_umap"], Hvv, Hs)
Now we can make the visualization.
from distortions.visualization import dplot
N = neighborhoods(adata, 1)
dplot(embedding)\
.mapping(x="embedding_0", y="embedding_1")\
.inter_edge_link(N=N)\
.geom_ellipse()
At a high level, the main functions exported by this package are:
local_distortions
: Estimate the local distortion associated with each sample.neighborhoods
: Identify neighborhoods that have been fragmented by the embedding method. These are sets of points that had been close together in the original space but which are spread far apart in the embedding.dplot
: Initialize a distortion plot object. Different encodings and interactions can be layered on top of this initial call.
Each dplot
object has a few static (geom
) and interactive (inter
) layers
that we can then assemble to create a distortion plot.
geom_ellipse
: Draw an ellipse layer that encodes the local distortion associated with each sample.geom_hair
: Draw an line segment layer that encodes the local distortion associated with each sample. It's visually more compact thangeom_ellipse
, at the cost of only showing the ratio between ellipse axes lengths.inter_isometry
: Interactively isometrize from the region surrounding the mouse. This reduces the distortion around the mouse position, at the potential cost of increasing distortion globally.inter_edge_link
: Highlight distorted neighborhoods. This expects the output ofneighborhoods
as input. Hovering over one distorted neighborhood reveals all the edges that it's made up of.inter_boxplot
: Allow selection of outlying edges which have either much larger or smaller embedding distance relative to their original distance.
The full function reference can be found here. You can find more realistic examples applying the package in the articles listed at the side of this page.
Help
You can reach us by creating an Issue in the package repository or sending an email to ksankaran@wisc.edu. We appreciate your trying out the package and will try our best to reply promptly.