This vignette highlights an approach to integrative analysis that leverages interactive visualization. Sometimes, the relationships between data types can be difficult to determine algorithmically, but may become clear upon direct manipulation.

The high-level idea is that traditional, single-table analysis output something like a map – a visual device that let’s us make comparisons. The appeal of many dimensionality reduction or trajectory reconstruction techniques is that we can survey the landscape of cells at a glance. The algorithms give reductions of the raw data that are easy to navigate: isolated islands or gradually evolving rivers of cell types are easier to recognize than if we had to inspect the raw input counts.

Continuing this analogy, this suggests that a goal of integrative analysis is to be able to relate pairs of maps. In the visualizations below, we use the linking principle (Buja et al. 1991) to relate the abstract U-Map constructed from protein expression levels with the physical spatial map produced by MIBI-TOF. Linking let’s us estimate the probability that a cell lies in a particular location in one map, conditional on its location in the other.

Since we want to be able to generate these interactive views on demand, we wrap the core javascript visualization code in an htmlwidgets package.

Setup

First, we’ll load packages needed to prepare the data and run the analysis. Note that we use spatial data analysis packages to prepare the geojson which ends up as input to the visualization

knitr::opts_chunk$set(message = FALSE, warning = FALSE)
library("dplyr")
library("geojsonio")
library("raster")
library("readr")
library("rmapshaper")
library("umap")
library("BIRSBIO2020.scProteomics.embeddings")

Next, we load (or download) the MIBI-TOF data.

data_dir <- file.path(Sys.getenv("HOME"), "Data")
download_data(data_dir)
## [[1]]
## [1] "/github/home/Data/mibiSCE.rda"
## 
## [[2]]
## [1] "/github/home/Data/masstagSCE.rda"
## 
## [[3]]
## [1] "/github/home/Data/TNBC_shareCellData"
loaded_ <- load_mibi(data_dir)
cell_full <- read_csv(file.path(data_dir,  "TNBC_shareCellData", "cellData.csv"))

Our visualizations will be made on a patient-by-patient basis. While in principle it would be possible to construct a visualization across all patients simultaneously, laying out the spatial maps in an intelligent way introduces complexity. For example, it would be more informative if the cell layouts of similar patients were placed closer to one another. For this reason, we postpone the problem of studying patients together, and focus on the usability of linking when working with one patient at a time.

Data preparation

The function below encapsulates all preprocessing that has to be performed for a single patient. The input are the full protein expression data, paths to all spatial Tiffs, and the ID of the patient to display. The output are a data frame giving the U-Map coordinates of all cells, along with a geojson giving the spatial layout of those cells. We simplify the geojson, to avoid creating heavy computation during interaction. We also crop the rasters, so that preprocessing (and the compilation of this vignette) is faster.

input_data <- function(cell_full, tiff_paths, sample_id, crop_size=1000, simplify=0.4) {
  # read the associated raster
  im <- raster(tiff_paths[grepl(paste0("p", sample_id, "_"), tiff_paths)])
  im <- crop(im, extent(im, 0, crop_size, 0, crop_size)) # comment out to work on full data
  polys <- polygonize(im) %>%
    filter(!(cellLabelInImage %in% c(0, 1)))

  # filter down to relevant data
  cell_data <- cell_full %>%
    filter(
      SampleID == sample_id,
      cellLabelInImage %in% polys$cellLabelInImage
    )

  polys <- polys %>%
    left_join(cell_data)

  # run U-Map on channel data
  channels <- setdiff(colnames(cell_data), c("cellLabelInImage", "SampleID", "cellSize", "Background", "tumorYN", "cell_cluster", "tumorCluster", "Group", "immuneCluster", "immuneGroup"))
  dimred <- umap(cell_data[, channels]) %>%
    .[["layout"]] %>%
    data.frame() %>%
    rename(V1 = X1, V2 = X2)
  dimred <- cbind(cell_data, dimred)

  # convert to geojson
  geo_polys <- geojson_json(polys) %>%
    geojson_sp() %>%
    ms_simplify(keep = simplify) %>%
    geojson_json()

  list("dimred" = dimred, "polys" = geo_polys)
}

Visualization

Below is the visualization applied to the output of this function. Each pair of panels corresponds to the subset of cells from a region of tissue within a single patient. On the left hand side are the physical locations of each cells, shaded in according to cell type. On the right hand side are the abstract coordinates described by U-Map. For completeness, the cell types are shaded according to

  • Tumor clusters: 4, 7, 10, 17.
  • Immune clusters: 1, 2, 3, 4, 8, 10, 11, 12.

Click and drag over a set of cells (in either panel) to select them. The corresponding locations of those cells in the other panel are highlighted when you do so. If a brush is added to each panel, then all cells in the union of the two selections are highlighted. See the example here for typical use.

data_ <- list()
for (i in seq_len(6)) {
  data_[[i]] <- input_data(cell_full, loaded_$tiffs, i)
}

linked_views(data_[[1]]$polys, data_[[1]]$dimred)
linked_views(data_[[2]]$polys, data_[[2]]$dimred)
linked_views(data_[[3]]$polys, data_[[3]]$dimred)
linked_views(data_[[4]]$polys, data_[[4]]$dimred)
linked_views(data_[[5]]$polys, data_[[5]]$dimred)
linked_views(data_[[6]]$polys, data_[[6]]$dimred)

Buja, Andreas, John Alan McDonald, John Michalak, and Werner Stuetzle. 1991. “Interactive Data Visualization Using Focusing and Linking.” In IEEE Visualization, 91:156–63.