A first look at activations in a deep learning model.
In the last lecture, we discussed the conceptual foundations of feature learning. In this lecture, we’ll see how to extract and visualize features learned by a computer vision model.
We will inspect a model that was trained1 to distinguish between photos of cats and dogs. We’ve included a subsample of the training dataset below – the full dataset can be downloaded here. From the printout, you can see that we have saved 20 images, each of size \(150 \times 150\) pixels, and with three color channels (red, green, and blue).
f <- tempfile()
download.file("https://uwmadison.box.com/shared/static/o7t3nt77iv3twizyv7yuwqnca16f9nwi.rda", f)
images <- get(load(f))
dim(images) # 20 sample images
[1] 20 150 150 3
par
allows us to plot many images
side by side (in this case, in a \(4 \times 5\) grid).par(mfrow = c(4, 5), mai = rep(0.00, 4))
out <- images %>%
array_tree(1) %>%
map(~ plot(as.raster(., max = 255)))
The array_tree
function above splits the 4D array into a collection of 3D
slices. Each of these 3D slices corresponds to one image — the three channels
correspond to red, green, and blue colors, respectively. The next map
line
plots each of the resulting 3D arrays
Next, let’s consider what types of features the model has learned, in order to distinguish between cats and dogs. Our approach will be to compute activations on a few images and visualize them as 2D feature maps. These visualizations will help us see whether there are systematic patterns in what leads to an activation for a particular neuron.
To accomplish this, we will create an R object to retrieve all the intermediate feature activations associated with an input image. Every time we call this object on a new image, it will return the activations for features at all layers.
# download model
f <- tempfile()
download.file("https://uwmadison.box.com/shared/static/9wu6amgizhgnnefwrnyqzkf8glb6ktny.h5", f)
model <- load_model_hdf5(f)
layer_outputs <- map(model$layers, ~ .$output)
activation_model <- keras_model(inputs = model$input, outputs = layer_outputs)
features <- predict(activation_model, images)
features
corresponds to a different layer. Within a single
layer, the 3D array provides the activations of each feature across different
spatial windows. For example, for the first layer, there are 32 features with
activations spread across a 148 x 148 grid, each grid element with its own
spatial context.dim(features[[1]])
[1] 20 148 148 32
plot_feature <- function(feature) {
rotate <- function(x) t(apply(x, 2, rev))
image(rotate(feature), axes = FALSE, asp = 1, col = brewer.pal(4, "Blues"))
}
ix <- 3
par(mfrow = c(1, 2), mai = rep(0.00, 4))
plot(as.raster(images[ix,,, ], max = 255))
plot_feature(features[[1]][ix,,, 1])
par(mfrow = c(6, 7), mai = rep(0.00, 4))
out <- features[[2]][ix,,,] %>%
array_branch(margin = 3) %>%
map(~ plot_feature(.))
par(mfrow = c(6, 7), mai = rep(0.00, 4))
out <- features[[6]][ix,,,1:40] %>%
array_branch(margin = 3) %>%
map(~ plot_feature(.))
For attribution, please cite this work as
Sankaran (2024, Jan. 7). STAT 436 (Spring 2024): Visualizing Learned Features. Retrieved from https://krisrs1128.github.io/stat436_s24/website/stat436_s24/posts/2024-12-27-week13-2/
BibTeX citation
@misc{sankaran2024visualizing, author = {Sankaran, Kris}, title = {STAT 436 (Spring 2024): Visualizing Learned Features}, url = {https://krisrs1128.github.io/stat436_s24/website/stat436_s24/posts/2024-12-27-week13-2/}, year = {2024} }