When building a neural network with the Keras framework for deep learning, I often want to have a quick and dirty way of checking whether everything is all right. That is, whether my layers output data correctly, whether my parameters are in check, and whether I have a good feeling about the model as a whole.
Keras model summaries help me do this. They provide a text-based overview of what I've built, which is especially useful when I have to add symmetry such as with autoencoders. But how to create these summaries? And why are they so useful? We'll discover this in today's blog post.
Firstly, we'll look at some high-level building blocks which I usually come across when I build neural networks. Then, we continue by looking at how Keras model summaries help me during neural network development. Subsequently, we generate one ourselves, by adding it to an example Keras ConvNet. This way, you'll be able to generate model summaries too in your Keras models.
Are you ready? Let's go! 😊
I've created quite a few neural networks over the past few years. While everyone has their own style in creating them, I always see a few high-level building blocks return in my code. Let's share them with you, as it will help you understand the model with which we'll be working today.
First of all, you'll always state the imports of your model. For example, you import Keras - today often as tensorflow.keras.something
, but you'll likely import Numpy, Matplotlib and other libraries as well.
Next, and this is entirely personal, you'll find the model configuration. The model compilation and model training stages - which we'll cover soon - require configuration. This configuration is then spread across a number of lines of code, which I find messy. That's why I always specify a few Python variables storing the model configuration, so that I can refer to those when I actually configure the model.
Example variables are the batch size, the size of your input data, your loss function, the optimizer that you will use, and so on.
Once the model configuration was specified, you'll often load and preprocess your dataset. Loading the dataset can be done in a multitude of ways - you can load data from file, you can use the Keras datasets, it doesn't really matter. Below, we'll use the latter scenario. Preprocessing is done in a minimal way - in line with the common assumption within the field of deep learning that models will take care of feature extraction themselves as much as possible - and often directly benefits the training process.
Once data is ready, you next specify the architecture of your neural network. With Keras, you'll often use the Sequential API, because it's easy. It allows you to stack individual layers on top of each other simply by calling model.add
.
Specifying the architecture actually means creating the skeleton of your neural network. It's a design, rather than an actual model. To make an actual model, we move to the model compilation step - using model.compile
. Here, we actually instantiate the model with all the settings that we configured before. Once compiled, we're ready to start training.
Starting the training process is what we finally do. By using model.fit
, we fit the dataset that we're training with to the model. The training process should now begin as configured by yourself.
Finally, once training has finished, you wish to evaluate the model against data that it hasn't yet seen - to find out whether it really performs and did not simply overfit to your training set. We use model.evaluate
for this purpose.
...what is lacking, though, is some quick and dirty information about your model. Can't we generate some kind of summary?
Unsurprisingly, we can! 😀 It would look like this:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 30, 30, 32) 896
_________________________________________________________________
conv2d_1 (Conv2D) (None, 28, 28, 64) 18496
_________________________________________________________________
conv2d_2 (Conv2D) (None, 26, 26, 128) 73856
_________________________________________________________________
flatten (Flatten) (None, 86528) 0
_________________________________________________________________
dense (Dense) (None, 128) 11075712
_________________________________________________________________
dense_1 (Dense) (None, 10) 1290
=================================================================
Total params: 11,170,250
Trainable params: 11,170,250
Non-trainable params: 0
_________________________________________________________________
There are multiple benefits that can be achieved from generating a model summary:
Convinced? Great 😊
Now that we know some of the high-level building blocks of a Keras model, and know how summaries can be beneficial to understand your model, let's see if we can actually generate a summary!
For this reason, we'll give you an example Convolutional Neural Network for two-dimensional inputs. Here it is:
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D
from tensorflow.keras.losses import sparse_categorical_crossentropy
from tensorflow.keras.optimizers import Adam
# Model configuration
batch_size = 50
img_width, img_height, img_num_channels = 32, 32, 3
loss_function = sparse_categorical_crossentropy
no_classes = 10
no_epochs = 25
optimizer = Adam()
validation_split = 0.2
verbosity = 1
# Load CIFAR-10 data
(input_train, target_train), (input_test, target_test) = cifar10.load_data()
# Determine shape of the data
input_shape = (img_width, img_height, img_num_channels)
# Parse numbers as floats
input_train = input_train.astype('float32')
input_test = input_test.astype('float32')
# Scale data
input_train = input_train / 255
input_test = input_test / 255
# Create the model
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(no_classes, activation='softmax'))
# Compile the model
model.compile(loss=loss_function,
optimizer=optimizer,
metrics=['accuracy'])
# Fit data to model
history = model.fit(input_train, target_train,
batch_size=batch_size,
epochs=no_epochs,
verbose=verbosity,
validation_split=validation_split)
# Generate generalization metrics
score = model.evaluate(input_test, target_test, verbose=0)
print(f'Test loss: {score[0]} / Test accuracy: {score[1]}')
Clearly, all the high-level building blocks are visible:
float32
format, which might speed up the training process when you are using a GPU powered version of Keras. Thirdly, and finally, we scale the data, to ensure that we don't face massive weight swings during the optimization step after each iteration. As you can see, we don't really do feature engineering in terms of the features themselves, but rather, we do some things to benefit the training process.Now, how to add that summary?
Very simple.
Add model.summary()
to your code, perhaps with a nice remark, like # Display a model summary
. Like this:
# Create the model
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(no_classes, activation='softmax'))
# Display a model summary
model.summary()
# Compile the model
model.compile(loss=loss_function,
optimizer=optimizer,
metrics=['accuracy'])
Running the model again then nicely presents you the model summary:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 30, 30, 32) 896
_________________________________________________________________
conv2d_1 (Conv2D) (None, 28, 28, 64) 18496
_________________________________________________________________
conv2d_2 (Conv2D) (None, 26, 26, 128) 73856
_________________________________________________________________
flatten (Flatten) (None, 86528) 0
_________________________________________________________________
dense (Dense) (None, 128) 11075712
_________________________________________________________________
dense_1 (Dense) (None, 10) 1290
=================================================================
Total params: 11,170,250
Trainable params: 11,170,250
Non-trainable params: 0
_________________________________________________________________
Nice! 🎆
In this blog post, we looked at generating a model summary for your Keras model. This summary, which is a quick and dirty overview of the layers of your model, display their output shape and number of trainable parameters. Summaries help you debug your model and allow you to immediately share the structure of your model, without having to send all of your code.
For this to work, we also looked at some high-level components of a Keras based neural network that I often come across when building models. Additionally, we provided an example ConvNet to which we added a model summary.
Although it's been a relatively short blog post, I hope that you've learnt something today! If you did, or didn't, or when you have questions/remarks, please leave a comment in the comments section below. I'll happily answer your comment and improve my blog post where necessary.
Thank you for reading MachineCurve today and happy engineering! 😎
Keras. (n.d.). Utils: Model Summary. https://keras.io/utils/#print_summary
Learn how large language models and other foundation models are working and how you can train open source ones yourself.
Keras is a high-level API for TensorFlow. It is one of the most popular deep learning frameworks.
Read about the fundamentals of machine learning, deep learning and artificial intelligence.
To get in touch with me, please connect with me on LinkedIn. Make sure to write me a message saying hi!
The content on this website is written for educational purposes. In writing the articles, I have attempted to be as correct and precise as possible. Should you find any errors, please let me know by creating an issue or pull request in this GitHub repository.
All text on this website written by me is copyrighted and may not be used without prior permission. Creating citations using content from this website is allowed if a reference is added, including an URL reference to the referenced article.
If you have any questions or remarks, feel free to get in touch.
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation.
Montserrat and Source Sans are fonts licensed under the SIL Open Font License version 1.1.
Mathjax is licensed under the Apache License, Version 2.0.