Training a ConvNet can be equal to training a black box: you start the training process, get a model that performs (or not) and that's it. It's then up to you to find out what is possibly wrong, and whether it can be improved any further. This is difficult, since you cannot look inside the black box.
Or can you? In the past few years, many techniques have emerged that allow you to take a look inside that black box!
In this blog post, we'll cover Activation Maximization. It can be used to generate a 'perfect representation' for some aspect of your model - and in this case, convolutional filters. We provide an example implementation with keras-vis
for visualizing your Keras CNNs, and show our results based on the VGG16 model.
All right - let's go! 😎
I find them interesting, these convolutional neural networks - you feed them image-like data, they start learning, and you may end up with a model that can correctly identify objects within real images, or classify the real images as a whole.
However, it's important to understand how convolutional neural networks work if we wish to understand how we can visualize their filters with Activation Maximization (which we will also cover next).
If you wish to understand convolutional neural networks in more detail, I would like to recommend you read these two blogs:
However, if you already have a slight understanding about them or only need to reiterate your existing knowledge, hang on tight - as we'll give you a crash course ConvNets here.
Recall that this is the generic structure of a ConvNet:
The input might be a W x H RGB image, meaning that the input to the ConvNet is three-dimensional: the width, the height and the red, blue and green channels.
Once the data is input, it passes through N kernels (where N is an integer number, such as 3) or filters that have the same dimension. These kernels slide over the input data, performing element-wise multiplications, generating N feature maps of width Wfm and height Hfm, depending on the size of the kernel.
This convolutional operation is often followed by pooling operations, possibly by other convolutional operations, and likely, finally, by densely-connected neural operations, to generate a prediction. It is hence part of the high-level supervised learning process.
This also sheds light on how ConvNets actually learn. We saw that for any input, the kernels help determine the feature map. The kernels thus contain the patterns that the model has learnt. As after training the kernels are kept constant, they drive the predictions for all the inputs when a model is put into production (possibly augmented with the weights from the densely-connected layers - it's important to know that convolutions and Dense layers are often combined in ConvNets.)
But how does it learn? And what does it learn? Even though it might sound difficult, it's actually pretty simple. We know that the kernels contain the learnt information. They thus need to be adapted when learning needs to take place. From the high-level supervised learning process, the concept of a loss function, and the concept of an optimizer, we know that:
Kernels are nothing but neurons structured differently. Hence, learning can take place by shifting neuron weights, which means that the high-level supervised learning process is responsible for changing the neurons. Recall that kernels are also called filters every now and then. With that in mind, let's now take a look at the concept of Activation Maximization - which we can use to visualize these filters.
In different blog post, we used Activation Maximization to visualize the perfect input to produce some class prediction. This is a really powerful idea: we derive whether the model has learnt correctly by generating some input that maximizes activations in order to produce some output, which we set in advance - to some class that we wish to check. Really nice results!
But how does Activation Maximization work? The principle is simple:
keras-vis
Today, we'll be creating ConvNet filter visualizations with Keras, the deep learning framework that is deeply integrated with TensorFlow and originally created by François Chollet. We're going to use keras-vis
for this purpose, which is a third-party toolkit for visualizing Keras models, supporting Activation Maximization, Saliency Maps and Grad-CAM class activation maps.
Or, in their words:
keras-vis is a high-level toolkit for visualizing and debugging your trained keras neural net models.
We will use it to visualize what a Keras based ConvNet sees through (some of its) filters, by means of Activation Maximization.
keras-vis
The first step is installing keras-vis
. Unfortunately, it is a little bit less straight-forward than performing a pip install keras-vis
. That is due to the status of the pip
package: it's not up to date, and hence doesn't run with newer Keras versions.
Fortunately, there is an escape.
It's actually rather simple, too: first, open up a terminal, preferably the terminal where you have access to all the other dependencies (Python, Keras, and so on). Second, run this command:
pip install https://github.com/raghakot/keras-vis/archive/master.zip
It still uses pip
to install keras-vis
, but simply installs the most recent version from the Github repository.
When you see this (or anything more recent than 0.5.0
, you've successfully installed keras-vis
:
>pip install https://github.com/raghakot/keras-vis/archive/master.zip
Collecting https://github.com/raghakot/keras-vis/archive/master.zip
Downloading https://github.com/raghakot/keras-vis/archive/master.zip
\ 58.1MB 819kB/s
Building wheels for collected packages: keras-vis
Building wheel for keras-vis (setup.py) ... done
Successfully built keras-vis
Installing collected packages: keras-vis
Successfully installed keras-vis-0.5.0
Now, let's take a look at today's model. Contrary to other posts, where we used a simple Convolutional Neural Network for visualization purposes (e.g. in our other Activation Maximization post), we don't use simple ones here today. This is due to the nature of this post: we're interested in generating filter visualizations that are relatively discriminative in terms of abstractness, yet show enough similarity to the task that we can include them here.
Fortunately, the Keras framework comes to the rescue; more specifically, the keras.applications
(Github here). It is delivered with various model architectures included. That's perfect for our task today! 🎉
We're using the VGG16 model today. This model, which was created by scientists at the Visual Geometry Group (hence VGG) at the University of Oxford, participated in the ImageNet Large Scale Visual Recognition Challenge of 2014. It uses many (sixteen, in our case - hence VGG16) convolutional layers and has achieved substantial accuracies in the 2014 competition. If you wish to read more about VGG16, click here for an excellent resource.
ConvNets can be trained on any dataset. However, what often happens is that large-scale datasets are used for pretraining, only to be slightly altered by subsequent training afterwards, possibly for another purpose - a process called Transfer Learning. These large-scale datasets therefore come delivered with such models often, and in this case it's the same. The Keras VGG16
model can be used directly while it is initialized with weights trained on the ImageNet dataset, if you wish. Today, we'll do precisely that: visualizing filters of the VGG16 model when it's initialized on the ImageNet dataset.
Let's go! 😀
As usual, you'll to install a set of dependencies if you wish to run the model & the visualization code:
Now, let's write some code! 😎
To start, create a file in some directory, e.g. activation_maximization_filters.py
.
Open this file in the code editor of your choice, and write with me:
'''
ConvNet filter visualization with Activation Maximization on exemplary VGG16 Keras model
'''
from keras.applications import VGG16
from vis.utils import utils
from vis.visualization import visualize_activation, get_num_filters
from vis.input_modifiers import Jitter
import matplotlib.pyplot as plt
import numpy as np
import random
import os.path
These are the imports that you'll need for today's tutorial:
VGG16
from keras.applications
, which is the model that we're using today.keras-vis
, you'll import utils
(for finding the layer index of the to be visualized layer later), visualize_activation
and get_num_filters
(for the visualization part) and Jitter
(to boost image quality).plt
.random
is used for drawing a random sample, and os.path
for selecting the path to write the images to.Now, let's define the name of the folder which we'll be writing into:
# Define the folder name to save into
folder_name = 'filter_visualizations'
Then define the model:
# Define the model
model = VGG16(weights='imagenet', include_top=True)
This initializes the VGG16
model into the model
variable, and initializes it with weights trained on the ImageNet dataset. With include_top
, the densely-connected layers that generate the prediction are included; if set to False
, you'll only get the convolutional layers. The latter is especially useful when you wish to use pretrained Keras models as your convolutional base, to train additional layers further.
Next, we can generate some visualizations!
# Iterate over multiple layers
for layer_nm in ['block1_conv1', 'block2_conv1', 'block3_conv2', 'block4_conv1', 'block5_conv2']:
This part means that your code will iterate over an array that contains various layers:
That is, it will generate visualizations for (a random selection of) the filters that are part of these blocks. In your case, you may choose any blocks. This, however, comes in to flavors. When using Keras pretrained models, you can look for the layer names in the code available for these models - such as for the VGG16 at Keras' GitHub (search for 'block1_conv1' on this page, to give you an example). When you do however visualize the Conv filters of your own models, you'll have to name layers yourself when you stack the architecture:
model.add(Dense(no_classes, activation='softmax', name='dense_layer'))
When adding these names to the array above, you'll ensure that you're visualizing the correct layers.
The following code is part of the iteration, which means that it runs every time the loop is activated (in our case, five times, for five layers):
# Find the particular layer
layer_idx = utils.find_layer_idx(model, layer_nm)
...this keras-vis
util finds the correct layer index for the name that we specify. layer_nm
, in this case, is one of the layer names in the array, e.g. block1_conv1
.
# Get the number of filters in this layer
num_filters = get_num_filters(model.layers[layer_idx])
We then retrieve the number of filters in this layer. This is also done by applying a nice Keras-vis util.
Then, we select six filters randomly (with replacement, so there's a small chance that you visualize one or two filters twice - but I've found that this doesn't really happen given the large number of filters present in VGG16. For your own model, this may be different):
# Draw 6 filters randomly
drawn_filters = random.choices(np.arange(num_filters), k=6)
Finally, we visualize each filter drawn:
# Visualize each filter
for filter_id in drawn_filters:
img = visualize_activation(model, layer_idx, filter_indices=filter_id, input_modifiers=[Jitter(16)])
plt.imshow(img)
img_path = os.path.join('.', folder_name, layer_nm + '_' + str(filter_id) + '.jpg')
plt.imsave(img_path, img)
print(f'Saved layer {layer_nm}/{filter_id} to file!')
keras_vis
, for the particular filter_id
, modifying the input with Jitter
to make the images more clear.keras-vis
) with Matplotlib.img_path
(based on the folder_name
, layer_nm
and filter_id
properties) with Matplotlib.In my case, these were the results:
For the first Conv layer in the first Conv block, the results are not very detailed. However, filters clearly distinguish from each other, as can be seen from the results:
In the second block, a little bit more detail becomes visible. Certain stretched patterns seem to be learnt by the filters.
This gets even clearer in the third block. The stretches are now combined with clear patterns, and even blocky representations, like in the center-bottom visualization.
Details become visible in the fourth convolutional block. It's still difficult to identify real objects in these visualizations, though.
This latter becomes possible in the visualizations generated from the fifth block. We see eyes and other shapes, which clearly resemble the objects that this model was trained to identify.
This clearly illustrates that the model learns very detailed patterns near the output, i.e. in the final layers of the model, whereas more global and abstract ones are learnt in the early layers. It now makes perfect sense why the first two or perhaps three layers of ImageNet trained models are often used in practical settings in order to boost training accuracy: the patterns that are learnt are so general that they do not necessarily represent the object in question, but rather the shape in question. While both the sun, a football and a volleyball are round, we don't know whether an input is any of those in the first few layers. We do know, however, that it's round.
In this blog post, we've seen how we can use Activation Maximization to generate visualizations for filters in our CNNs, i.e. convolutional neural networks. We provided an example that demonstrates this by means of the keras-vis
toolkit, which can be used to visualize Keras models.
I hope you've learnt something today! 😀 If you did, or if you have any questions or remarks, please feel free to leave a comment in the comments box below 👇 Thank you for reading MachineCurve today and happy engineering! 😎
Kotikalapudi, Raghavendra and contributors. (2017). Github / keras-vis. Retrieved from https://github.com/raghakot/keras-vis
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
VGG16 - Convolutional Network for Classification and Detection. (2018, November 21). Retrieved from https://neurohive.io/en/popular-networks/vgg16/
Learn how large language models and other foundation models are working and how you can train open source ones yourself.
Keras is a high-level API for TensorFlow. It is one of the most popular deep learning frameworks.
Read about the fundamentals of machine learning, deep learning and artificial intelligence.
To get in touch with me, please connect with me on LinkedIn. Make sure to write me a message saying hi!
The content on this website is written for educational purposes. In writing the articles, I have attempted to be as correct and precise as possible. Should you find any errors, please let me know by creating an issue or pull request in this GitHub repository.
All text on this website written by me is copyrighted and may not be used without prior permission. Creating citations using content from this website is allowed if a reference is added, including an URL reference to the referenced article.
If you have any questions or remarks, feel free to get in touch.
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation.
Montserrat and Source Sans are fonts licensed under the SIL Open Font License version 1.1.
Mathjax is licensed under the Apache License, Version 2.0.