Machine learning has been around for many decades now. Starting with the Rosenblatt Perceptron in the 1950s, followed by Multilayer Perceptrons and a variety of other machine learning techniques like Support Vector Machines, we have arrived in the age of deep neural networks since 2012.
In the last few years, we have seen an explosion of machine learning research: a wide variety of neural network architectures was invented, published, and the same goes for tuning the neural networks - i.e., what set of hyperparameters works best given a certain problem scenario. That's why training a neural network is often considered to be more of an art than a science - intuition through experience often guides the deep learning engineer into picking the right configuration for their model.
However, I do believe that this is going to end. Not deep learning itself, but the amount of knowledge required for successfully training a deep neural network. In fact, training ML models is being commoditized... and in today's blog, we'll cover one of the ways in which this is currently happening, namely, with the Keras Tuner. Keras Tuner is a technique which allows deep learning engineers to define neural networks with the Keras framework, define a search space for both model parameters (i.e. architecture) and model hyperparameters (i.e. configuration options), and first search for the best architecture before training the final model.
We'll first cover the supervised machine learning process and illustrate hyperparameter tuning and its difficulties in more detail. Subsequently, we'll provide some arguments as to why automating hyperparameter tuning can lead to better end results in possibly less time. Then, we introduce the Keras Tuner, and close off with a basic example so that you can get basic experience. In another blog post, we'll cover the Keras Tuner building blocks, which will help you gain a deeper understanding of automated hyperparameter tuning.
Update 08/Dec/2020: added references to PCA article.
Let's take a step back. Before we can understand automated parameter and hyperparameter tuning, we must first take a look at what it is in the first place.
That's why we'll take a look at the high-level supervised machine learning process that we're using to explain how training a neural network works throughout this website.
Here it is:
In your machine learning workflow, you have selected or extracted features and targets for your model based on a priori analysis of your dataset - perhaps using dimensionality reduction techniques like PCA. Using those features, you will be able to train your machine learning model - visible in green. You do so iteratively:
If you look at how we build models, you'll generally see that doing so consists of three individual steps:
In step (1), you add various layers of your neural network to the skeleton, such as the Convolutional Neural Network created here with Keras:
# Create the model
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(no_classes, activation='softmax'))
Here, the architectural choices you make (such as the number of filters for a Conv2D
layer, kernel size, or the number of output nodes for your Dense
layer) determine what are known as the parameters of your neural network - the weights (and by consequence biases) of your neural network:
The parameters of a neural network are typically the weights of the connections. In this case, these parameters are learned during the training stage. So, the algorithm itself (and the input data) tunes these parameters.
However, things don't end there. Rather, in step (2), you'll configure the model during instantiation by setting a wide range of configuration options. Those options include, but are not limited to:
Here's why they are called _hyper_parameters:
The hyper parameters are typically the learning rate, the batch size or the number of epochs. The are so called "hyper" because they influence how your parameters will be learned. You optimize these hyper parameters as you want (depends on your possibilities): grid search, random search, by hand, using visualisationsā¦ The validation stage help you to both know if your parameters have been learned enough and know if your hyper parameters are good.
As Robin suggests, hyperparameters can be selected (and optimized) in multiple ways. The easiest way of doing so is by hand: you, as a deep learning engineer, select a set of hyperparameters that you will subsequently alter in an attempt to make the model better.
However, can't we do this in a better way when training a Keras model?
As you would have expected: yes, we can! :) Let's introduce Keras Tuner to the scene. As you would expect from engineers, the description as to what it does is really short but provides all the details:
A hyperparameter tuner for Keras, specifically for tf.keras with TensorFlow 2.0.
If you already want to look around, you could visit their website, and if not, let's take a look at what it does.
Keras Tuner can be used for automatically tuning the parameters and hyperparameters of your Keras model. It does so by means of a search space. If you are used to a bit of mathematics, you are well aware of what a space represents. If not, and that's why we're using this particular space, you can likely imagine what we mean when we talk about a three-dimensional or a two-dimensional space.
Indeed, in the case of a 2D space - where the axes represent e.g. the hyperparameter learning rate and the parameter (or, more strictly, contributing factor to the number of parameters) number of layers, you can visualize the space as follows:
Here, all the intersections between the two axes (dimensions) are possible combinations of hyperparameters that can be selected for the model. For example, learning rate \(LR\) and number of layers \(N\) can be \((LR = 10^{-3}, N = 4)\), but also \((LR = 10^{-2}, N = 2)\) is possible, and so on. Here, we have two dimensions (which benefits visualization), but the more tunable options you add to your model, the more dimensions will be added to your search space.
Hopefully, you are now aware about how a search space is constructed by yourself when you want Keras Tuner to look for a most optimal set of hyperparameters and parameters for your neural network.
You can use a wide range of HyperParameters
building block styles for creating your search space:
true
or false
array
of choices from which one value is chosen for a set of hyperparameters.Although the choice values and float/integer values look a lot like each other, they are different - in the sense that you can specify a range in the latter. However, that's too much detail for now - we will cover all the tunable HyperParameters
in that different blog post we already mentioned before. At this point, it's important that you understand that using Keras Tuner will allow you to construct a search space by means of the building blocks mentioned before.
And it's also important that you understand that it does so within constraints set by the user. That is, searching the hyperparameter space cannot go on indefinitely. Keras Tuner allows you to constrain searching: by setting a maximum number of trials, you can tell the tuner to cut off tuning after some time.
There's one thing missing, still. It's nice that we have a seach space, but how exactly does Keras Tuner perform the search operation?
By means of a search strategy!
It's like as if you've lost something, and there are multiple options you can configure to find back what you've lost. And as with anything, there are many ways in which you can do a particular thing... the same is true for searching through your hyperparameter space :)
We'll cover the various search strategies in more detail in that other blog post that we've mentioned. Here's a brief overview of the search strategies that are supported by Keras Tuner:
Now let's take a look at using Keras Tuner for optimizing your Keras model. We will be building a simple ConvNet, as we have seen in the Conv2D tutorial. We'll subsequently tune its hyperparameters with Keras Tuner for a limited number of epochs, and finally train the best model fully. We'll keep it simple: we're only going to construct a one-dimensional search space based on the learning rate for the Adam optimizer.
Make sure that Keras Tuner is installed by executing pip install -U keras-tuner
first in your machine learning environment :)
Open up your IDE and create a file e.g. called tuning.py
. Here, you're going to write down your code. We'll start with imports (such as tensorflow.keras
and kerastuner
), defining the model configuration options and loading the data. If you have no experience in doing so, I recommend that you first read the Conv2D post as I explain these things there in more detail. Here's the code that you'll add first:
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D
from tensorflow.keras.losses import sparse_categorical_crossentropy
from tensorflow.keras.optimizers import Adam
from kerastuner.tuners import RandomSearch
# Model configuration
batch_size = 50
img_width, img_height, img_num_channels = 28, 28, 1
loss_function = sparse_categorical_crossentropy
no_classes = 10
no_epochs = 25
validation_split = 0.2
verbosity = 1
# Load MNIST data
(input_train, target_train), (input_test, target_test) = mnist.load_data()
# Reshape data
input_train = input_train.reshape(input_train.shape[0], img_width, img_height, 1)
input_test = input_test.reshape(input_test.shape[0], img_width, img_height, 1)
# Determine shape of the data
input_shape = (img_width, img_height, img_num_channels)
# Parse numbers as floats
input_train = input_train.astype('float32')
input_test = input_test.astype('float32')
# Scale data
input_train = input_train / 255
input_test = input_test / 255
In brief, what it does:
float32
format which allows GPU owners to train their models faster.Keras Tuner allows you to perform your experiments in two ways. The first, and more scalable, approach is a HyperModel
class, but we don't use it today - as Keras Tuner itself introduces people to automated hyperparameter tuning via model-building functions.
Those functions are nothing more than a Python def
where you create the model skeleton and compile it, as you would do usually. However, here, you also construct your search space - that space we explained above. For example, I make the learning rate hyperparameter tunable by specifying it as follows: hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])
.
Here's the code for the model-building function. If you've used Keras before, you instantly recognize what it does!
# MODEL BUILDING FUNCTION
def build_model(hp):
# Create the model
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(no_classes, activation='softmax'))
# Display a model summary
model.summary()
# Compile the model
model.compile(loss=loss_function,
optimizer=Adam(
hp.Choice('learning_rate',
values=[1e-2, 1e-3, 1e-4])),
metrics=['accuracy'])
# Return the model
return model
Now, it's time to perform tuning. As we've constructed our search space, we must first define our search strategy - and it will be RandomSearch
today:
# Perform tuning
tuner = RandomSearch(
build_model,
objective='val_accuracy',
max_trials=5,
executions_per_trial=3,
directory='tuning_dir',
project_name='machinecurve_example')
We'll add the model-building function as the function that contains our model and our search space. Our goal is to minimize validation accuracy (Keras Tuner automatically infers whether it should be maximized or minimized based on the objective), tell it that it should perform 5 trials, and that it should perform 3 executions per trial. The latter ensures that it's not simply variance that causes a hyperparameter to be 'best', as more instances of better performance tend to suggest that performance is actually better. The directory
and project_name
attributes are set so that checkpoints of the tuning operations are saved.
Now that we have configured our search strategy, it's time to print a summary of it and actually perform the search operation:
# Display search space summary
tuner.search_space_summary()
# Perform random search
tuner.search(input_train, target_train,
epochs=5,
validation_split=validation_split)
Here, we instruct Keras Tuner to perform hyperparameter tuning with our training set, for 5 epochs per trial, and to make sure to make a validation split (of 20%, in our case, given how we have configured our model).
Once the search is complete, you can get the best model, and train it fully as per your configuration:
# Get best model
models = tuner.get_best_models(num_models=1)
best_model = models[0]
# Fit data to model
history = best_model.fit(input_train, target_train,
batch_size=batch_size,
epochs=no_epochs,
verbose=verbosity,
validation_split=validation_split)
# Generate generalization metrics
score = model.evaluate(input_test, target_test, verbose=0)
print(f'Test loss: {score[0]} / Test accuracy: {score[1]}')
That's it! :) You should now have a fully working Keras Tuner based hyperparameter tuner. If you run python tuning.py
, of course while having all the dependencies installed onto your system, the tuning and eventually the training process should begin.
If you wish to obtain the full model code, that's of course also possible. Here you go:
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D
from tensorflow.keras.losses import sparse_categorical_crossentropy
from tensorflow.keras.optimizers import Adam
from kerastuner.tuners import RandomSearch
# Model configuration
batch_size = 50
img_width, img_height, img_num_channels = 28, 28, 1
loss_function = sparse_categorical_crossentropy
no_classes = 10
no_epochs = 25
validation_split = 0.2
verbosity = 1
# Load MNIST data
(input_train, target_train), (input_test, target_test) = mnist.load_data()
# Reshape data
input_train = input_train.reshape(input_train.shape[0], img_width, img_height, 1)
input_test = input_test.reshape(input_test.shape[0], img_width, img_height, 1)
# Determine shape of the data
input_shape = (img_width, img_height, img_num_channels)
# Parse numbers as floats
input_train = input_train.astype('float32')
input_test = input_test.astype('float32')
# Scale data
input_train = input_train / 255
input_test = input_test / 255
# MODEL BUILDING FUNCTION
def build_model(hp):
# Create the model
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(no_classes, activation='softmax'))
# Display a model summary
model.summary()
# Compile the model
model.compile(loss=loss_function,
optimizer=Adam(
hp.Choice('learning_rate',
values=[1e-2, 1e-3, 1e-4])),
metrics=['accuracy'])
# Return the model
return model
# Perform tuning
tuner = RandomSearch(
build_model,
objective='val_accuracy',
max_trials=1,
executions_per_trial=1,
directory='tuning_dir',
project_name='machinecurve_example')
# Display search space summary
tuner.search_space_summary()
# Perform random search
tuner.search(input_train, target_train,
epochs=5,
validation_split=validation_split)
# Get best model
models = tuner.get_best_models(num_models=1)
best_model = models[0]
# Fit data to model
history = best_model.fit(input_train, target_train,
batch_size=batch_size,
epochs=no_epochs,
verbose=verbosity,
validation_split=validation_split)
# Generate generalization metrics
score = model.evaluate(input_test, target_test, verbose=0)
print(f'Test loss: {score[0]} / Test accuracy: {score[1]}')
In this blog post, you've been introduced to automated tuning of your neural network parameters and hyperparameters. Over the next years, this will become an increasingly important aspect of machine learning, in my opinion - because why leave to humans what computers could do better? Maybe, machine learning configuration will even become commoditized because of such progress! The benefit is that you've read this post (and may likely deepen your understanding by performing some Google searches). You're now aware of this trend, and can steer your learnings towards staying on top of the machine learning wave :)
What's more, you've also been able to get some practical experience with a code example using Keras Tuner. I hope you've learnt something today, and that it will help your machine learning endeavors :) If you have any questions, remarks, or other comments, please feel free to leave a comment in the comments section below. Thank you for reading MachineCurve today and happy engineering! š
Keras tuner. (n.d.).Ā https://keras-team.github.io/keras-tuner/
Data Science Stack Exchange. (n.d.).Ā Model parameters & hyper parameters of neural network & their tuning in training & validation stage.Ā https://datascience.stackexchange.com/questions/17635/model-parameters-hyper-parameters-of-neural-network-their-tuning-in-training
Learn how large language models and other foundation models are working and how you can train open source ones yourself.
Keras is a high-level API for TensorFlow. It is one of the most popular deep learning frameworks.
Read about the fundamentals of machine learning, deep learning and artificial intelligence.
To get in touch with me, please connect with me on LinkedIn. Make sure to write me a message saying hi!
The content on this website is written for educational purposes. In writing the articles, I have attempted to be as correct and precise as possible. Should you find any errors, please let me know by creating an issue or pull request in this GitHub repository.
All text on this website written by me is copyrighted and may not be used without prior permission. Creating citations using content from this website is allowed if a reference is added, including an URL reference to the referenced article.
If you have any questions or remarks, feel free to get in touch.
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation.
Montserrat and Source Sans are fonts licensed under the SIL Open Font License version 1.1.
Mathjax is licensed under the Apache License, Version 2.0.