← Back to homepage

Using EarlyStopping and ModelCheckpoint with TensorFlow 2 and Keras

May 30, 2019 by Chris

Training a neural network can take a lot of time. In some cases, especially with very deep architectures trained on very large data sets, it can take weeks before one's model is finally trained.

In Keras, when you train a neural network such as a classifier or a regression model, you'll usually set the number of epochs when you call model.fit:

fit(x=None, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None, validation_freq=1)

Unfortunately, setting a fixed number of epochs is often a bad idea. Here's why:

This is quite a dilemma, isn't it? How do we choose what number of epochs to use?

You cannot simply enter a random value due to the reasons above.

Neither can you test without wasting more resources. What's more, if you think to avert the dilemma by finding out with a very small subset of your data, then I've got some other news - you just statistically altered your sample by drawing a subset from the original sample. You may now find that by using the original data set for training, it is still not optimal.

What to do? :( In this tutorial, we'll check out one way of getting beyond this problem: using a combination of Early Stopping and model checkpointing. Let's see what it is composed of.

In other words, this tutorial will teach you...

Let's take a look 🚀

Update 13/Jan/2021: Added code example to the top of the article, so that people can get started immediately. Also ensured that the article is still up-to-date, and added a few links to other articles.

Update 02/Nov/2020: Made model code compatible with TensorFlow 2.x.

Update 01/Feb/2020: Added links to other MachineCurve blog posts and processed textual corrections.

Code example: how to use EarlyStopping and ModelCheckpoint with TensorFlow?

This code example immediately teaches you how EarlyStopping and ModelCheckpointing can be used with TensorFlow. It allows you to get started straight away. If you want to understand both callbacks in more detail, however, then make sure to continue reading the rest of this tutorial.

from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

keras_callbacks   = [
      EarlyStopping(monitor='val_loss', patience=30, mode='min', min_delta=0.0001),
      ModelCheckpoint(checkpoint_path, monitor='val_loss', save_best_only=True, mode='min')

model.fit(x_train, y_train,

EarlyStopping and ModelCheckpoint in Keras

Fortunately, if you use Keras for creating your deep neural networks, it comes to the rescue.

It has two so-called callbacks which can really help in settling this issue, avoiding wasting computational resources a priori and a posteriori. They are named EarlyStopping and ModelCheckpoint. This is what they do:

Together, EarlyStopping and ModelCheckpoint allow you to stop early, saving computational resources, while maintaining the best performing instance of your model automatically. That's precisely what you want.

Example implementation

Let's build one of the Keras examples step by step. It uses one-dimensional convolutional layers for classifying IMDB reviews and, according to its metadata, achieves about 90% test accuracy after just two training epochs.

We will slightly alter it in order to (1) include the callbacks and (2) keep it running until it no longer improves.

Let's first load the Keras imports. Note that we also include numpy, which is not done in the Keras example. We include it because we'll need to fix the random number generator, but we'll come to that shortly.

from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation
from tensorflow.keras.layers import Embedding
from tensorflow.keras.layers import Conv1D, GlobalMaxPooling1D
from tensorflow.keras.datasets import imdb
import numpy as np

We will then set the parameters. Note that instead of 2 epochs in the example, we'll use 200.000 epochs here.

# set parameters:
max_features = 5000
maxlen = 400
batch_size = 32
embedding_dims = 50
filters = 250
kernel_size = 3
hidden_dims = 250
epochs = 200000

We'll fix the random seed in Numpy. This allows us to use the same pseudo random number generator every time. This removes the probability that variation in the data is caused by the pseudo-randomness between multiple instances of a 'random' number generator - rather, the pseudo-randomness is equal all the time.


We then load the data. We make a load_data call to the IMDB data set, which is provided in Keras by default. We load a maximum of 5.000 words according to our configuration file. The load_data definition provided by Keras automatically splits the data in training and testing data (with inputs x and targets y). In order to create feature vectors that have the same shape, the sequences are padded. That is, 0.0 is added towards the end. Neural networks tend not to be influenced by those numbers.

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)

Next up is the model itself. It is proposed by Google. Given the goal of this blog post, there's not much need for explaining whether the architecture is good (which is the case, though):

model = Sequential()

Next, we compile the model. Binary crossentropy is used since we have two target classes (positive and negative) and our task is a classification task (for which crossentropy is a good way of computing loss). The optimizer is Adam, which is a state-of-the-art optimizer combining various improvements to original stochastic gradient descent. As an additional metric which is more intuitive to human beings, accuracy is included as well.


We'll next make slight changes to the example. Google utilizes the test data for validation; we don't do that. Rather, we'll create a separate validation split from the training data. We thus end up with three distinct data sets: a training set, which is used to train the model; a validation set, which is used to study its predictive power after every epoch, and a testing set, which shows its generalization power since it contains data the model has never seen. We generate the validation data by splitting the training data in actual training data and validation date. We use a 80/20 split for this; thus, 20% of the original training data will become validation data. All right, let's fit the training data and start the training process.

model.fit(x_train, y_train,

Later, we'll evaluate the model with the test data.

Adding the callbacks

We must however first add the callbacks to the imports at the top of our code:

from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

We can then include them into our code. Just before model.fit, add this Python variable:

keras_callbacks   = [
      EarlyStopping(monitor='val_loss', patience=30, mode='min', min_delta=0.0001),
      ModelCheckpoint(checkpoint_path, monitor='val_loss', save_best_only=True, mode='min')

As you can see, the callbacks have various configuration options:

Those are not the only parameters. There's many more for both ModelCheckpoint and EarlyStopping, but they're used less commonly. Do however check them out!

All right, if we would now add the callback variable to the model.fit call, we'd have a model that stops when it no longer improves and saves the best model. Replace your current code with this:

model.fit(x_train, y_train,

Okay, let's run it and see what happens :)

All right, let's give it a go!

Numpy allow_pickle error

It may be that you'll run into issues with Numpy when you load the data into a Numpy array. Specifically, the error looks as follows:

ValueError: Object arrays cannot be loaded when allow_pickle=False

It occurs because Numpy has recently inverted the default value for allow_pickle and Keras has not updated yet. Altering imdb.py in keras/datasets folder will resolve this issue. Let's hope the pull request that has been issued for this problem will be accepted soon. Change line 59 into:

with np.load(path, allow_pickle=True) as f:

Update February 2020: this problem should be fixed in any recent Keras version! 🎉

Keras results

You'll relatively quickly see the results:

Epoch 1/200000
20000/20000 [==============================] - 10s 507us/step - loss: 0.4380 - acc: 0.7744 - val_loss: 0.3145 - val_acc: 0.8706
Epoch 00001: val_loss improved from inf to 0.31446, saving model to C:\Users\chris\DevFiles\Deep Learning/testmodel.h5

Epoch 2/200000
20000/20000 [==============================] - 7s 347us/step - loss: 0.2411 - acc: 0.9021 - val_loss: 0.2719 - val_acc: 0.8890
Epoch 00002: val_loss improved from 0.31446 to 0.27188, saving model to C:\Users\chris\DevFiles\Deep Learning/testmodel.h5

Epoch 3/200000
20000/20000 [==============================] - 7s 344us/step - loss: 0.1685 - acc: 0.9355 - val_loss: 0.2733 - val_acc: 0.8924
Epoch 00003: val_loss did not improve from 0.27188

Apparently, the training process achieves optimal validation loss after just two epochs (which was also indicated by the Google engineers who created the model code we are thankful for using and which we adapted), because after epoch 32 it shows:

Epoch 32/200000
20000/20000 [==============================] - 7s 366us/step - loss: 0.0105 - acc: 0.9960 - val_loss: 0.7375 - val_acc: 0.8780
Epoch 00032: val_loss did not improve from 0.27188
Epoch 00032: early stopping

...and the training process comes to a halt, as we intended :) Most likely, the model can still be improved - e.g. by introducing learning rate decay and finding the best learning rate prior to the training process - but hey, that wasn't the goal of this exercise.

I've also got my HDF5 file:

Let's evaluate the model

We can next comment out everything from model = Sequential() up to and including model.fit. Let's add some evaluation functionality.

We should load the model, so we should add its feature to the imports:

from tensorflow.keras.models import load_model

And subsequently add evaluation code just after the code that was commented out:

model = load_model(checkpoint_path)
scores = model.evaluate(x_test, y_test, verbose=1)
print(f'Score: {model.metrics_names[0]} of {scores[0]}; {model.metrics_names[1]} of {scores[1]*100}%')

Next, run it again. Instead of training the model again (you commented out the code specifying the model and the training process), it will now load the model you saved during training and evaluate it. You will most likely see a test accuracy of ≈ 88%.

25000/25000 [==============================] - 3s 127us/step
Score: loss of 0.27852724124908446; acc of 88.232%

All right! Now you know how you can use the EarlyStopping and ModelCallback checkpoints in Keras, allowing you to save precious resources when a model no longer improves. Let me wish you all the best with your machine learning adventures and please, feel free to comment if you have questions or comments. I'll be happy to respond and to improve my work if you feel I've made a mistake. Thanks!


Thanks a lot to the authors of those works!

Hi, I'm Chris!

I know a thing or two about AI and machine learning. Welcome to MachineCurve.com, where machine learning is explained in gentle terms.