Interpreting neurons by finding optimal inputs
So far, we’ve visualized neural networks by analyzing the activations of learned features across observed samples. A complementary approach is to ask instead — is there a hypothetical image that would maximize the activation of a particular neuron? If we can construct such an image, then we might have a better sense of the types of image concepts to which a neuron is highly sensitive.
We will illustrate these ideas on a network that has been trained on Imagenet. This is a large image dataset with many (thousands of) class labels, and it is often used to evaluate image classification algorithms. The network is loaded below.
model <- application_vgg16(weights = "imagenet", include_top = FALSE)
mean_activation <- function(image, layer, ix=1) {
h <- layer(image)
k_mean(h[,,, ix])
}
mean_activation
function above.gradient_step <- function(image, layer, ix=1, lr=1e-3) {
with(tf$GradientTape() %as% tape, {
tape$watch(image)
objective <- mean_activation(image, layer, ix)
})
grad <- tape$gradient(objective, image)
image <- image + lr * grad
}
Figure 1: Starting from a random image, we can take a gradient step in the image space to increase a given neuron’s mean activation.
n_iter
gradient steps in the direction that maximizes the activation of feature ix
.random_image <- function() {
tf$random$uniform(map(c(1, 150, 150, 3), as.integer))
}
gradient_ascent <- function(layer, ix = 1, n_iter = 100, lr = 10) {
im_seq <- array(0, dim = c(n_iter, 150, 150, 3))
image <- random_image()
for (i in seq_len(n_iter)) {
image <- gradient_step(image, layer, ix, lr)
im_seq[i,,,] <- as.array(image[1,,,])
}
im_seq
}
Figure 2: Taking many gradient steps leads us towards an image that optimizes a neuron’s activation.
squash <- function(x) {
(x - min(x)) / (max(x) - min(x))
}
par(mfrow = c(5, 8), mai = rep(0.00, 4))
activation_model <- keras_model(inputs = model$input, outputs = model$layers[[3]]$output)
for (i in seq_len(40)) {
im_seq <- gradient_ascent(activation_model, ix = i)
plot(as.raster(squash(im_seq[100,,,])))
}
Figure 3: The hypothetical images that maximize the activations for 40 different neurons. These neurons seem to pull out features related to color and edge orientations.
We can think of these features as analogous to a collection of basis functions. At the first layer, the network is representing each image as a combination of basis images, related to particular color or edge patterns.
We can compare these activation maximizing inputs with those associated with later layers. It seems that the basis images at this level are more intricate, reflecting textures and common objects across this dataset. For example, the polka dot pattern may be strongly activated by cat eyes.
par(mfrow = c(5, 8), mai = rep(0.00, 4))
activation_model <- keras_model(inputs = model$input, outputs = model$layers[[8]]$output)
for (i in seq_len(40)) {
im_seq <- gradient_ascent(activation_model, ix = i)
plot(as.raster(squash(im_seq[100,,,])))
}
Figure 4: The results of the corresponding optimization for 40 neurons in layer 8.