Interpreting neurons by finding optimal inputs
So far, we’ve visualized neural networks by analyzing the activations of learned features across observed samples. A complementary approach is to ask instead — is there a hypothetical image that would maximize the activation of a particular neuron? If we can construct such an image, then we might have a better sense of the types of image concepts to which a neuron is highly sensitive.
We will illustrate these ideas on a network that has been trained on Imagenet. This is a large image dataset with many (thousands of) class labels, and it is often used to evaluate image classification algorithms. The network is loaded below.
model <- application_vgg16(weights = "imagenet", include_top = FALSE)
mean_activation <- function(image, layer, ix=1) {
h <- layer(image)
k_mean(h[,,, ix])
}
mean_activation
function above.gradient_step <- function(image, layer, ix=1, lr=1e-3) {
with(tf$GradientTape() %as% tape, {
tape$watch(image)
objective <- mean_activation(image, layer, ix)
})
grad <- tape$gradient(objective, image)
image <- image + lr * grad
}
n_iter
gradient steps in the direction that maximizes the activation of feature ix
.random_image <- function() {
tf$random$uniform(map(c(1, 150, 150, 3), as.integer))
}
gradient_ascent <- function(layer, ix = 1, n_iter = 100, lr = 10) {
im_seq <- array(0, dim = c(n_iter, 150, 150, 3))
image <- random_image()
for (i in seq_len(n_iter)) {
image <- gradient_step(image, layer, ix, lr)
im_seq[i,,,] <- as.array(image[1,,,])
}
im_seq
}
squash <- function(x) {
(x - min(x)) / (max(x) - min(x))
}
par(mfrow = c(5, 8), mai = rep(0.00, 4))
activation_model <- keras_model(inputs = model$input, outputs = model$layers[[3]]$output)
for (i in seq_len(40)) {
im_seq <- gradient_ascent(activation_model, ix = i)
plot(as.raster(squash(im_seq[100,,,])))
}
We can think of these features as analogous to a collection of basis functions. At the first layer, the network is representing each image as a combination of basis images, related to particular color or edge patterns.
We can compare these activation maximizing inputs with those associated with later layers. It seems that the basis images at this level are more intricate, reflecting textures and common objects across this dataset. For example, the polka dot pattern may be strongly activated by cat eyes.
par(mfrow = c(5, 8), mai = rep(0.00, 4))
activation_model <- keras_model(inputs = model$input, outputs = model$layers[[8]]$output)
for (i in seq_len(40)) {
im_seq <- gradient_ascent(activation_model, ix = i)
plot(as.raster(squash(im_seq[100,,,])))
}