Analyzing feature activations across datasets
Reading (1, 2), Recording, Rmarkdown
The previous notes gave us a look into the features learned by a deep learning model. However, we could only look at one feature within one layer at a time. We also only studied an individual image. If we want to better understand the representations learned by a network, we will need ways of analyzing collections of features taken from throughout the network, across entire datasets.
This seems like an impossible task, but it turns out that, in real-world models, the learned features tend to be highly correlated. Certain patterns of activation tend to recur across similar images. This kind of structure makes it possible to use clustering and dimensionality-reduction to begin to make sense of the learned representations of individual networks.
To illustrate this idea, we will download the same model from before along with a larger subsample of images used in training.
f <- tempfile()
download.file("https://uwmadison.box.com/shared/static/dxibamcr0bcmnj7xazqxnod8wtew70m2.rda", f)
images <- get(load(f))
f <- tempfile()
download.file("https://uwmadison.box.com/shared/static/9wu6amgizhgnnefwrnyqzkf8glb6ktny.h5", f)
model <- load_model_hdf5(f)
l <- c(model$layers[[6]]$output, model$layers[[8]]$output)
activation_model <- keras_model(inputs = model$input, outputs = l)
features <- predict(activation_model, images)
ideally, we could work with a matrix of samples by features. the \(ij^{th}\) element would be the activation of feature \(j\) on observation \(i\).
this is unfortunately not immediately available. as we saw before, each feature map is actually a small array across spatial contexts, not a single number. there is no single way to aggregate across a feature map, and it is common to see people use the maximum, average, norm, or variance of a feature map as a summary for how strongly that feature activates on a given image. we will take the mean of activations.
feature_means <- function(h) {
apply(h, c(1, 4), mean) %>%
as_tibble()
}
h <- map_dfc(features, feature_means) %>%
set_colnames(str_c("feature_", 1:ncol(.))) %>%
mutate(id = row_number())
top_ims <- h %>%
slice_max(feature_3, n = 20) %>%
pull(id)
par(mfrow = c(5, 4), mai = rep(0.00, 4))
out <- images[top_ims,,,] %>%
array_tree(1) %>%
map(~ plot(as.raster(., max = 255)))
This particular example should serve as a kind of warning. While it’s easy to imbue models with human-like characteristics, they often arrive at the answers they need in unexpected ways. We asked the model to distinguish between cats and dogs, but it is using whether the image has grass in it as a predictor. While for this dataset this may be accurate, I would expect this model to fail on an image of a cat in a grassy field.
Instead of only investigating one neuron, we can consider all the images and neurons simultaneously. The code below makes a heatmap of the average feature activations from before. Each row is an image and each column is a feature from either layers 6 or 8. A similar example is given in the reading, where coordinated views reveal that certain patterns of neuron activation encode the lifts of the pen or specific curve shapes in a handwriting generation network.
sub <- function(x) {
select(x, starts_with("feature"))
}
cluster_result <- kmeans(sub(h), centers = 25, nstart = 20)
centroids <- tidy(cluster_result)
D <- pdist(sub(centroids), sub(h)) %>%
as.matrix()
par(mfrow = c(5, 4), mai = rep(0.00, 4))
near_centroid <- order(D[3, ])[1:20]
out <- images[near_centroid,,, ] %>%
array_tree(1) %>%
map(~ plot(as.raster(., max = 255)))