Function Reference
API Reference
distortions.geometry
Geometry
Bases: object
The Geometry class stores the data, distance, affinity and laplacian matrices used by the various embedding methods and is the primary object passed to embedding functions.
The Geometry class contains functions to compute the aforementioned matrices and allows for re-computation whenever necessary.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adjacency_method
|
string {'auto', 'brute', 'pyflann', 'cyflann'}
|
method for computing pairwise radius neighbors graph. |
'auto'
|
adjacency_kwds
|
dict
|
dictionary containing keyword arguments for adjacency matrix. see distance.py docmuentation for arguments for each method. If new kwargs are passed to compute_adjacency_matrix then this dictionary will be updated. |
None
|
affinity_method
|
string {'auto', 'gaussian'}
|
method of computing affinity matrix |
'auto'
|
affinity_kwds
|
dict
|
dictionary containing keyword arguments for affinity matrix. see affinity.py documentation for arguments for each method. If new kwargs are passed to compute_affinity_matrix then this dictionary will be updated. |
None
|
laplacian_method
|
(string,)
|
type of laplacian to be computed. Possibilities are {'symmetricnormalized', 'geometric', 'renormalized', 'unnormalized', 'randomwalk'} see laplacian.py for more information. |
'auto'
|
laplacian_kwds
|
dice
|
dictionary containing keyword arguments for Laplacian matrix. see laplacian.py docmuentation for arguments for each method. If new kwargs are passed to compute_laplacian_matrix then this dictionary will be updated. |
None
|
**kwargs
|
additional arguments will be parsed and used to override values in
the above dictionaries. For example:
- |
{}
|
Source code in distortions/geometry/geometry.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 |
|
compute_adjacency_matrix(copy=False, **kwargs)
This function will compute the adjacency matrix. In order to acquire the existing adjacency matrix use self.adjacency_matrix as comptute_adjacency_matrix() will re-compute the adjacency matrix.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
copy
|
boolean, whether to return a copied version of the adjacency matrix
|
|
False
|
**kwargs
|
see distance.py docmuentation for arguments for each method.
|
|
{}
|
Returns:
Type | Description |
---|---|
self.adjacency_matrix : sparse matrix (N_obs, N_obs)
|
Non explicit 0.0 values should be considered not connected. |
Source code in distortions/geometry/geometry.py
compute_affinity_matrix(copy=False, **kwargs)
This function will compute the affinity matrix. In order to acquire the existing affinity matrix use self.affinity_matrix as comptute_affinity_matrix() will re-compute the affinity matrix.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
copy
|
boolean
|
whether to return a copied version of the affinity matrix |
False
|
**kwargs
|
see affinity.py docmuentation for arguments for each method. |
{}
|
Returns:
Type | Description |
---|---|
self.affinity_matrix : sparse matrix (N_obs, N_obs)
|
contains the pairwise affinity values using the Guassian kernel and bandwidth equal to the affinity_radius |
Source code in distortions/geometry/geometry.py
compute_laplacian_matrix(copy=True, return_lapsym=False, **kwargs)
Note: this function will compute the laplacian matrix. In order to acquire the existing laplacian matrix use self.laplacian_matrix as compute_laplacian_matrix() will re-compute the laplacian matrix.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
copy
|
boolean, whether to return copied version of the self.laplacian_matrix
|
|
True
|
return_lapsym
|
boolean, if True returns additionally the symmetrized version of
|
the requested laplacian and the re-normalization weights. |
False
|
**kwargs
|
see laplacian.py docmuentation for arguments for each method.
|
|
{}
|
Returns:
Type | Description |
---|---|
self.laplacian_matrix : sparse matrix (N_obs, N_obs).
|
The requested laplacian. |
self.laplacian_symmetric : sparse matrix (N_obs, N_obs)
|
The symmetric laplacian. |
self.laplacian_weights : ndarray (N_obs,)
|
The renormalization weights used to make laplacian_matrix from laplacian_symmetric |
Source code in distortions/geometry/geometry.py
delete_adjacency_matrix()
delete_affinity_matrix()
delete_data_matrix()
delete_laplacian_matrix()
set_adjacency_matrix(adjacency_mat)
Set the adjacency matrix.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adjacency_mat
|
sparse matrix, shape (n_samples, n_samples)
|
The adjacency matrix to input. |
required |
Source code in distortions/geometry/geometry.py
set_affinity_matrix(affinity_mat)
Set the affinity matrix.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
affinity_mat
|
sparse matrix (N_obs, N_obs).
|
The adjacency matrix to input. |
required |
Source code in distortions/geometry/geometry.py
set_data_matrix(X)
Set the data matrix.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
(array - like, shape(n_samples, n_features))
|
The original data set to input. |
required |
Source code in distortions/geometry/geometry.py
set_laplacian_matrix(laplacian_mat)
Set the Laplacian matrix.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
laplacian_mat
|
sparse matrix (N_obs, N_obs).
|
The Laplacian matrix to input. |
required |
Source code in distortions/geometry/geometry.py
set_matrix(X, input_type)
Set the data matrix given the input type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array - like
|
Input matrix to set. |
required |
input_type
|
str
|
Type of matrix to set. Options: {'data', 'adjacency', 'affinity'} |
required |
Source code in distortions/geometry/geometry.py
set_radius(radius, override=True, X=None, n_components=2)
Set the radius for the adjacency and affinity computation
By default, this will override keyword arguments provided on initialization.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
radius
|
float
|
radius to set for adjacency and affinity. |
required |
override
|
bool (default: True)
|
if False, then only set radius if not already defined in
|
True
|
X
|
ndarray or sparse(optional)
|
if provided, estimate a suitable radius from this data. |
None
|
n_components
|
int(default=2)
|
the number of components to use when estimating the radius |
2
|
Source code in distortions/geometry/geometry.py
bind_metric(embedding, Hvv, Hs)
Combine embedding coordinates with local Riemannian metric information.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
embedding
|
(ndarray, shape(n_samples, n_embedding_dims))
|
The low-dimensional embedding of the data. This should be the same array
as the |
required |
Hvv
|
(ndarray, shape(n_samples, n_embedding_dims, n_embedding_dims))
|
The singular vectors of the dual Riemannian metric tensor for each sample,
as returned by |
required |
Hs
|
(ndarray, shape(n_samples, n_embedding_dims))
|
The singular values of the dual Riemannian metric tensor for each sample,
as returned by |
required |
Returns:
Name | Type | Description |
---|---|---|
combined |
DataFrame
|
A DataFrame containing the embedding coordinates, the singular vectors and singular values of the local dual Riemannian metric for each sample, and an additional column "angle" computed from the first two singular vector components. |
Notes
This function is intended to facilitate analysis and visualization by merging the embedding and local metric information into a single tabular structure.
Source code in distortions/geometry/rmetric.py
boxplot_data(x, y, nbin=10, outlier_iqr=3, **kwargs)
Compute boxplot statistics and identify outliers within distance bins.
This function divides the x-values (typically true distances) into bins and computes boxplot statistics for the y-values (typically embedding distances) within each bin. It identifies outliers using the IQR method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
array - like
|
Input values used for binning (typically true/original distances). |
required |
y
|
array - like
|
Target values for which to compute statistics (typically embedding distances). |
required |
nbin
|
int
|
Number of bins to divide the x-value range into. |
10
|
outlier_iqr
|
float
|
IQR multiplier for outlier detection. Values beyond Q1 - outlier_iqrIQR or Q3 + outlier_iqrIQR within each bin are considered outliers. |
3
|
**kwargs
|
keyword arguments
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
Name | Type | Description |
---|---|---|
summaries |
DataFrame
|
DataFrame with boxplot statistics for each bin containing columns: - 'bin_id': bin identifier - 'q1', 'q2', 'q3': quartile values - 'min', 'max': minimum and maximum values - 'iqr': interquartile range - 'lower', 'upper': outlier detection bounds - 'bin': string representation of bin range |
outliers |
DataFrame
|
DataFrame with outlier information containing columns: - 'index': original index of outlier point - 'bin_id': which bin the outlier belongs to - 'bin': string representation of bin range - 'value': the outlier y-value |
Source code in distortions/geometry/neighborhoods.py
local_distortions(embedding, data, geom)
Compute local Riemannian metric distortions for each sample.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
embedding
|
(ndarray, shape(n_samples, n_embedding_dims))
|
Low-dimensional embedding of the data. Each row corresponds to a sample, and each column corresponds to an embedding dimension. |
required |
data
|
(ndarray, shape(n_samples, n_features))
|
Original high-dimensional data. Each row is a sample, each column a feature. |
required |
geom
|
Geometry
|
An instance of the Geometry class (from geometry.py) that provides methods for setting the data matrix and computing the Laplacian matrix. |
required |
Returns:
Name | Type | Description |
---|---|---|
H |
ndarray
|
Dual Riemannian metric tensor for each sample. |
Hvv |
ndarray
|
Singular vectors of the dual metric tensor for each sample. |
Hs |
ndarray
|
Singular values of the dual metric tensor for each sample. |
Notes
This function sets the data matrix in the provided Geometry object, computes the Laplacian matrix, and then estimates the local Riemannian metric distortions in the embedding space using the original data.
Source code in distortions/geometry/rmetric.py
neighborhood_distances(adata, embed_key='X_umap')
Compute pairwise distances between samples and their neighbors in both original and embedding spaces.
This function calculates pairwise distances between each sample and its neighbors in the original high-dimensional space and compares them with distances in the reduced embedding space. This is useful for analyzing how well the embedding preserves local neighborhood structure.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
AnnData
|
Annotated data matrix. Must contain a precomputed embedding (e.g., UMAP or t-SNE) in |
required |
embed_key
|
str
|
Key in |
"X_umap"
|
Returns:
Type | Description |
---|---|
DataFrame
|
DataFrame with columns:
- 'center': index of the sample (cell)
- 'neighbor': index of the neighbor sample
- 'true': distance in the original space (from |
Notes
The number of neighbors is determined by the structure of the neighbor graph in adata.obsp["distances"]
.
The function assumes that the embedding and neighbor graph have already been computed.
Source code in distortions/geometry/neighborhoods.py
local_distortions(embedding, data, geom)
Compute local Riemannian metric distortions for each sample.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
embedding
|
(ndarray, shape(n_samples, n_embedding_dims))
|
Low-dimensional embedding of the data. Each row corresponds to a sample, and each column corresponds to an embedding dimension. |
required |
data
|
(ndarray, shape(n_samples, n_features))
|
Original high-dimensional data. Each row is a sample, each column a feature. |
required |
geom
|
Geometry
|
An instance of the Geometry class (from geometry.py) that provides methods for setting the data matrix and computing the Laplacian matrix. |
required |
Returns:
Name | Type | Description |
---|---|---|
H |
ndarray
|
Dual Riemannian metric tensor for each sample. |
Hvv |
ndarray
|
Singular vectors of the dual metric tensor for each sample. |
Hs |
ndarray
|
Singular values of the dual metric tensor for each sample. |
Notes
This function sets the data matrix in the provided Geometry object, computes the Laplacian matrix, and then estimates the local Riemannian metric distortions in the embedding space using the original data.
Source code in distortions/geometry/rmetric.py
neighborhoods
boxplot_data(x, y, nbin=10, outlier_iqr=3, **kwargs)
Compute boxplot statistics and identify outliers within distance bins.
This function divides the x-values (typically true distances) into bins and computes boxplot statistics for the y-values (typically embedding distances) within each bin. It identifies outliers using the IQR method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
array - like
|
Input values used for binning (typically true/original distances). |
required |
y
|
array - like
|
Target values for which to compute statistics (typically embedding distances). |
required |
nbin
|
int
|
Number of bins to divide the x-value range into. |
10
|
outlier_iqr
|
float
|
IQR multiplier for outlier detection. Values beyond Q1 - outlier_iqrIQR or Q3 + outlier_iqrIQR within each bin are considered outliers. |
3
|
**kwargs
|
keyword arguments
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
Name | Type | Description |
---|---|---|
summaries |
DataFrame
|
DataFrame with boxplot statistics for each bin containing columns: - 'bin_id': bin identifier - 'q1', 'q2', 'q3': quartile values - 'min', 'max': minimum and maximum values - 'iqr': interquartile range - 'lower', 'upper': outlier detection bounds - 'bin': string representation of bin range |
outliers |
DataFrame
|
DataFrame with outlier information containing columns: - 'index': original index of outlier point - 'bin_id': which bin the outlier belongs to - 'bin': string representation of bin range - 'value': the outlier y-value |
Source code in distortions/geometry/neighborhoods.py
broken_knn(embedding, k=2, z_thresh=1.0)
Determine broken points in embedding space using k-NN distances and Z-score thresholding.
This function identifies potentially problematic points in an embedding by computing their average k-nearest neighbor distances, calculating Z-scores, and flagging points that exceed the threshold as broken or isolated.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
embedding
|
(array - like, shape(n_samples, n_features))
|
The embedding coordinates for all samples. |
required |
k
|
int
|
Number of nearest neighbors to consider for distance calculation. |
2
|
z_thresh
|
float
|
Z-score threshold for identifying broken points. Points with Z-scores greater than or equal to this value are considered broken. |
1.0
|
Returns:
Type | Description |
---|---|
list of int
|
List of indices of broken points, sorted by descending Z-score. If no points exceed the threshold, returns the single point with the highest Z-score. |
Source code in distortions/geometry/neighborhoods.py
identify_broken_box(dists, outlier_factor=3, nbin=10)
Identify broken links using boxplot-based outlier detection within distance bins.
This helper function bins the true distances and identifies outliers in the embedding distances within each bin using boxplot criteria.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dists
|
DataFrame
|
DataFrame with 'true' and 'embedding' distance columns. |
required |
outlier_factor
|
float
|
IQR multiplier for outlier detection threshold. |
3
|
nbin
|
int
|
Number of bins to divide the true distance range into. |
10
|
Returns:
Type | Description |
---|---|
DataFrame
|
Copy of input distances DataFrame with additional 'brokenness' boolean column indicating which links are identified as broken outliers. |
Source code in distortions/geometry/neighborhoods.py
identify_broken_window(dists, outlier_factor=3, percentiles=[75, 25], frame=[50, 50])
Identify broken links using sliding window smoothing and residual analysis.
This helper function applies a sliding window median filter to the distance relationship and identifies links where the embedding distance significantly exceeds the smoothed expectation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dists
|
DataFrame
|
DataFrame with 'true' and 'embedding' distance columns. |
required |
outlier_factor
|
float
|
Multiplier for IQR-based outlier threshold in residual analysis. |
3
|
percentiles
|
list of float
|
Percentiles used for IQR calculation. |
[75, 25]
|
frame
|
list of int
|
Window frame size [before, after] for sliding median calculation. |
[50, 50]
|
Returns:
Type | Description |
---|---|
DataFrame
|
DataFrame with original columns plus:
- 'embedding_smooth': smoothed embedding distances
- 'residual': difference between actual and smoothed embedding distances |
Source code in distortions/geometry/neighborhoods.py
iqr(x, percentiles)
Calculate the interquartile range between given percentiles.
This function computes the difference between two percentiles of the input array, typically used to measure the spread of data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
array - like
|
Input array for which to calculate the interquartile range. |
required |
percentiles
|
array-like of length 2
|
Two percentile values (e.g., [25, 75] for standard IQR). The function returns the difference between the higher and lower percentiles. |
required |
Returns:
Type | Description |
---|---|
float
|
The interquartile range (difference between the specified percentiles). |
Source code in distortions/geometry/neighborhoods.py
neighbor_generator(embedding, broken_locations=[], number_neighbor=10)
Generate neighbor lists for broken points in the embedding space.
This function finds nearest neighbors for specified broken points (or automatically detected ones) in the embedding space. It's useful for understanding the local neighborhood structure around problematic points.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
embedding
|
(array - like, shape(n_samples, n_features))
|
The embedding coordinates for all samples. |
required |
broken_locations
|
list of int
|
Indices of broken points for which to generate neighbors. If empty, automatically detects broken points using broken_knn(). |
[]
|
number_neighbor
|
int
|
Number of nearest neighbors to find for each broken point. |
10
|
Returns:
Type | Description |
---|---|
dict
|
Dictionary mapping broken point indices (int) to lists of their nearest neighbor indices, excluding the point itself. |
Source code in distortions/geometry/neighborhoods.py
neighborhood_distances(adata, embed_key='X_umap')
Compute pairwise distances between samples and their neighbors in both original and embedding spaces.
This function calculates pairwise distances between each sample and its neighbors in the original high-dimensional space and compares them with distances in the reduced embedding space. This is useful for analyzing how well the embedding preserves local neighborhood structure.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
AnnData
|
Annotated data matrix. Must contain a precomputed embedding (e.g., UMAP or t-SNE) in |
required |
embed_key
|
str
|
Key in |
"X_umap"
|
Returns:
Type | Description |
---|---|
DataFrame
|
DataFrame with columns:
- 'center': index of the sample (cell)
- 'neighbor': index of the neighbor sample
- 'true': distance in the original space (from |
Notes
The number of neighbors is determined by the structure of the neighbor graph in adata.obsp["distances"]
.
The function assumes that the embedding and neighbor graph have already been computed.
Source code in distortions/geometry/neighborhoods.py
neighborhoods(adata, outlier_factor=3, threshold=0.2, method='box', percentiles=[75, 25], frame=[50, 50], nbin=10, **kwargs)
Identify broken neighborhoods in embeddings using different methods.
This function serves as the main interface for detecting broken neighborhoods in dimensionality reduction embeddings. It supports multiple methods for identifying outliers and broken links between original and embedding spaces.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
AnnData
|
Annotated data matrix with precomputed embedding and neighbor graph. |
required |
outlier_factor
|
float
|
Factor used to determine outlier threshold. Higher values are more permissive (fewer outliers detected). |
3
|
threshold
|
float
|
Proportion threshold for flagging samples as having broken neighborhoods. Centers with more than this proportion of broken neighbors are flagged. |
0.2
|
method
|
str
|
Method for identifying broken neighborhoods. Options: - "box": Uses boxplot-based outlier detection - "window": Uses sliding window smoothing with residual analysis |
"box"
|
percentiles
|
list of float
|
Percentiles used for IQR calculation in windowing method. |
[75, 25]
|
frame
|
list of int
|
Window frame size [before, after] for sliding window smoothing. |
[50, 50]
|
nbin
|
int
|
Number of bins for boxplot method. |
10
|
**kwargs
|
keyword arguments
|
Additional arguments passed to neighborhood_distances(). |
{}
|
Returns:
Type | Description |
---|---|
dict
|
Dictionary mapping center indices to lists of their neighbor indices for samples with broken neighborhoods. |
Raises:
Type | Description |
---|---|
NotImplementedError
|
If an unsupported method is specified. |
Source code in distortions/geometry/neighborhoods.py
neighborhoods_box(adata, outlier_factor=3, threshold=0.2, nbin=10, **kwargs)
Identify broken neighborhoods using boxplot-based outlier detection.
This method bins the true distances and computes boxplot statistics within each bin. Links are considered broken if their embedding distance is an outlier relative to other links with similar true distances.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
AnnData
|
Annotated data matrix with precomputed embedding and neighbor graph. |
required |
outlier_factor
|
float
|
IQR multiplier for boxplot outlier detection. Values beyond Q1 - outlier_factorIQR or Q3 + outlier_factorIQR are outliers. |
3
|
threshold
|
float
|
Proportion threshold for flagging samples as having broken neighborhoods. |
0.2
|
nbin
|
int
|
Number of bins to divide the true distance range into. |
10
|
**kwargs
|
keyword arguments
|
Additional arguments passed to neighborhood_distances(). |
{}
|
Returns:
Type | Description |
---|---|
dict
|
Dictionary mapping center indices to lists of their neighbor indices for samples with broken neighborhoods. |
Source code in distortions/geometry/neighborhoods.py
neighborhoods_window(adata, outlier_factor=3, threshold=0.2, percentiles=[75, 25], frame=[50, 50], **kwargs)
Identify broken neighborhoods using window-based smoothing and residual analysis.
This method applies a sliding window median filter to the distance relationships and identifies outliers based on residuals from the smoothed curve. Points with large positive residuals indicate broken neighborhoods where embedding distances are much larger than expected.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
AnnData
|
Annotated data matrix with precomputed embedding and neighbor graph. |
required |
outlier_factor
|
float
|
Multiplier for IQR-based outlier threshold. Residuals greater than median + outlier_factor * IQR are considered broken. |
3
|
threshold
|
float
|
Proportion threshold for flagging samples as having broken neighborhoods. |
0.2
|
percentiles
|
list of float
|
Percentiles used for IQR calculation in residual analysis. |
[75, 25]
|
frame
|
list of int
|
Window frame size [before, after] for sliding median calculation. |
[50, 50]
|
**kwargs
|
keyword arguments
|
Additional arguments passed to neighborhood_distances(). |
{}
|
Returns:
Type | Description |
---|---|
dict
|
Dictionary mapping center indices to lists of their neighbor indices for samples with broken neighborhoods. |
Source code in distortions/geometry/neighborhoods.py
threshold_links(dists, brokenness, threshold=0.2)
Flag samples with high proportions of broken neighborhood links.
This function identifies samples where the proportion of broken neighborhood links exceeds a specified threshold, indicating problematic embedding regions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dists
|
DataFrame
|
DataFrame containing distance information with 'center' and 'neighbor' columns. |
required |
brokenness
|
DataFrame
|
DataFrame with 'center' and 'brokenness' columns indicating broken links. |
required |
threshold
|
float
|
Proportion threshold for flagging samples. Centers with more than this proportion of broken neighbors are included in the output. |
0.2
|
Returns:
Type | Description |
---|---|
dict
|
Dictionary mapping center indices (int) to lists of their neighbor indices for samples exceeding the brokenness threshold. |
Source code in distortions/geometry/neighborhoods.py
distortions.visualization
dplot
Bases: AnyWidget
Interactive Distortion Plot Widget
This class provides an interactive widget for visualizing distortion metrics computed on datasets, with a ggplot2-like syntax for adding graphical marks and overlaying distortion criteria. It is designed for use in Jupyter environments and leverages the anywidget and traitlets libraries for interactivity. You can pause mouseover interactivity by holding down the control key.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
The input dataset to visualize. Must be convertible to a list of records. |
required |
*args
|
tuple
|
Additional positional arguments passed to the parent AnyWidget. |
()
|
**kwargs
|
dict
|
Additional keyword arguments passed to the parent AnyWidget and used as visualization options. |
{}
|
Methods:
Name | Description |
---|---|
mapping |
Specify the mapping from data columns to visual properties. |
geom_ellipse |
Add an ellipse layer to the plot. |
geom_hair |
Add a hair (small oriented lines) layer to the plot. |
labs |
Add labels to the plot. |
geom_edge_link |
Add edge link geometry to the plot. |
inter_edge_link |
Add interactive edge link geometry to the plot. |
inter_isometry |
Add interactive isometry overlays to the plot. |
scale_color |
Add a color scale to the plot. |
scale_size |
Add a size scale to the plot. |
inter_boxplot |
Add an interactive boxplot layer for distortion metrics, using provided distance summaries and outlier information. |
save |
Save the current view to SVG. |
Examples:
>>> import pandas as pd
>>> df = pd.DataFrame({...})
>>> dplot(df).mapping(x='embedding_1', y='embedding_2').geom_ellipse()
Source code in distortions/visualization/interactive.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
|
scanpy_umap(adata, max_cells=200, n_neighbors=10, n_pcs=40)
Runs UMAP visualization on an AnnData object with basic preprocessing.
This wrapper function filters genes by minimum count, applies log transformation, selects highly variable genes, computes neighbors in PCA space, and runs UMAP.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
AnnData
|
AnnData experiment object containing the data to filter, transform, and apply UMAP to. |
required |
max_cells
|
int, optional (default: 200)
|
Maximum number of cells to use for visualization. |
200
|
n_neighbors
|
int, optional (default: 10)
|
Number of neighbors to use for constructing the neighborhood graph. |
10
|
n_pcs
|
int, optional (default: 40)
|
Number of principal components to use for neighborhood graph construction. |
40
|
Returns:
Name | Type | Description |
---|---|---|
adata |
AnnData
|
The AnnData object after preprocessing and UMAP computation. |
Notes
The function modifies the input AnnData object in place.
Examples:
>>> import scanpy as sc
>>> from distortion.visualization import scanpy_umap
>>> adata = sc.datasets.pbmc3k()
>>> adata_umap = scanpy_umap(adata, max_cells=100, n_neighbors=15, n_pcs=30)
>>> sc.pl.umap(adata_umap)