Training a deep learning model is a cyclical process. First, you feed forward data, generating predictions for each sample. Then, the predictions are compared and the comparison is aggregated into a loss value. Finally, using this loss value, errors are computed backwards using backpropagation and the model is optimized with gradient descent or an adaptive optimizer.
This way, you can train a model that really performs well - one that can be used in practice.
In this tutorial, we will take a close look at using Binary Crossentropy Loss with PyTorch. This loss, which is also called BCE loss, is the de facto standard loss for binary classification tasks in neural networks. After reading this tutorial, you will...
Let's get to work! 🚀
Training a neural network with PyTorch, PyTorch Lightning or PyTorch Ignite requires that you use a loss function. This is not specific to PyTorch, as they are also common in TensorFlow - and in fact, a core part of how a neural network is trained.
Choosing a loss function is entirely dependent on your dataset, the problem you are trying to solve and the specific variant of that problem. For binary classification problems, the loss function that is most suitable is called binary crossentropy loss. It compares the prediction, which is a number between 0 and 1, with the true target, that is either 0 or 1. Having the property that loss increases exponentially while the offset increases linearly, we get a way to punish extremely wrong predictions more aggressively than ones that are close to the target. This stabilizes the training process.
In PyTorch, binary crossentropy loss is provided by means of nn.BCELoss
. Below, you'll see how Binary Crossentropy Loss can be implemented with either classic PyTorch, PyTorch Lightning and PyTorch Ignite. Make sure to read the rest of the tutorial too if you want to understand the loss or the implementations in more detail!
Using BCELoss
in classic PyTorch is a two-step process:
Step 1 - the criterion definition:
criterion = nn.BCELoss()
Step 2 - using it in the custom training loop:
for epoch in range(5):
for i, data in enumerate(trainloader, 0):
inputs, labels = data
optimizer.zero_grad()
# Forward pass
outputs = net(inputs)
# Compute loss
loss = criterion(outputs, labels)
# Backward pass
loss.backward()
# Optimization
optimizer.step()
In Lightning, we can add BCELoss
to our training_step
, validation_step
and testing_step
like this to start using Binary Crossentropy Loss:
from torch import nn
import pytorch_lightning as pl
class NeuralNetwork(pl.LightningModule):
def training_step(self, batch, batch_idx):
x, y = batch
x = x.view(x.size(0), -1)
y_hat = self.layers(x)
loss = self.bce(y_hat, y)
self.log('train_loss', loss)
return loss
In Ignite, we can add BCELoss
as a criterion
to the Trainer creation for using Binary Crossentropy Loss. It can be added like this:
from torch import nn
criterion = nn.BCELoss()
trainer = create_supervised_trainer(model, optimizer, criterion, device=device)
From our article about the various classification problems that Machine Learning engineers can encounter when tackling a supervised learning problem, we know that binary classification involves grouping any input samples in one of two classes - a first and a second, often denoted as class 0 and class 1.
We also know from our article about loss functions and the high-level supervised machine learning process that when you train a neural network, these are the steps that the process will go through:
Sounds like a straight-forward process. But we didn't answer the how with respect to generating differences between predictions and the true sample, and the subsequent convergence of these into a loss value.
In fact, there are many loss functions that we can use for this purpose - and each combination of task, variant and data distribution has the best possible candidate.
For binary classification problems, the loss function of choice is the binary crossentropy loss, or the BCELoss, if you will. Don't be scared away by the maths, but it can be defined as follows:
Don't let the maths scare you away... just read on! 😉
Here, t
is the target value (either 0.0
or 1.0
- recall that the classes are represented as class 0 and class 1). The prediction p
can be any value between zero and one, as is common with the Sigmoid activation function. This function is commonly used to generate the output in the last layer of your neural network when performing binary classification. The log
here is the logarithm which generates the exponential properties that make the function so useful.
Visualized for the two possible targets and any value for p
between 0 and 1, this is what BCE loss looks like:
Indeed:
t = 0.0; p = 1.0
or t = 1.0; p = 0.0
), loss is highest - infinite, even, for an 1.0
delta.[0, 1]
) are supported.These properties make binary crossentropy a very suitable loss function for binary classification problems. Let's now take a look at how we can implement it with PyTorch and its varieties.
In this section, we'll see a step-by-step approach to constructing Binary Crossentropy Loss using PyTorch or any of the variants (i.e. PyTorch Lightning and PyTorch Ignite). As these are the main flavors of PyTorch these days, we'll cover all three of them.
In PyTorch, Binary Crossentropy Loss is provided as [nn.BCELoss](https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html)
. This loss function can be used with classic PyTorch, with PyTorch Lightning and with PyTorch Ignite. It looks like this (PyTorch, n.d.):
torch.nn.BCELoss(weight: Optional[torch.Tensor] = None, size_average=None, reduce=None, reduction: str = 'mean')
You can pass four optional arguments:
False
in order to avoid averaging losses across each minibatch. Instead, minibatch loss is then summed together. It is set to True
by default, computing the average.True
results the loss per minibatch instead of summing/averaging.none
, mean
, and sum
:none
, no reduction will be applied.mean
, the average will be computed.sum
, the sum will be computed.In classic PyTorch, we must define the training, testing and validation loops ourselves. Adding BCELoss
as a loss function is not too difficult, though. It involves specifying the loss as a criterion
first and then manually invoking it within e.g. the training loop.
Specifying the loss as a criterion involves using BCELoss
in the following way:
criterion = nn.BCELoss()
Here is an example of a (very simple) training loop. It performs nothing but resetting the optimizer (so that it can be used at every iteration), making a forward pass, computing the loss, performing the backward pass with backpropagation and subsequent model optimization.
for epoch in range(5):
for i, data in enumerate(trainloader, 0):
inputs, labels = data
optimizer.zero_grad()
# Forward pass
outputs = net(inputs)
# Compute loss
loss = criterion(outputs, labels)
# Backward pass
loss.backward()
# Optimization
optimizer.step()
Indeed, that's the high-level training process that we covered at the start of this tutorial!
PyTorch Lightning is a wrapper on top of native PyTorch which helps you organize code while benefiting from all the good things that PyTorch has to offer. In Lightning, the forward pass during training is split into three definitions: training_step
, validation_step
and testing_step
. These specify what should happen for the training process, its validation component and subsequent model evaluation, respectively.
Using native PyTorch under the hood, we can also use nn.BCELoss
here. The first step is initializing it in the __init__
definition:
from torch import nn
import pytorch_lightning as pl
class NeuralNetwork(pl.LightningModule):
def __init__(self):
super().__init__()
# Other inits, like the layers, are also here.
self.bce = nn.BCELoss()
Recall that a loss function computes the aggregate error when a set of predictions is passed - by comparing them to the ground truth for the samples. In the training_step
, we can create such functionality in the following way:
x
and y
, where obviously, \(\text{x} \rightarrow \text{y}\).x
so that it can be processed by our neural network.y_hat
, which is the set of predictions for x
, by feeding x
forward through our neural network defined in self.layers
. Note that you will see the creation of self.layers
in the full code example below.y_hat
(predictions) and y
(ground truth), log the loss, and return it. Based on this loss, PyTorch Lightning will handle the gradients computation and subsequent optimization (with the optimizer defined in configure_optimizers
, see the full code example below). def training_step(self, batch, batch_idx):
x, y = batch
x = x.view(x.size(0), -1)
y_hat = self.layers(x)
loss = self.bce(y_hat, y)
self.log('train_loss', loss)
return loss
Quite easy, isn't it? When added to a regular Lightning model i.e. to the LightningModule
, the full code looks as follows:
import os
import torch
from torch import nn
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader
from torchvision import transforms
import pytorch_lightning as pl
class MNISTNetwork(pl.LightningModule):
def __init__(self):
super().__init__()
self.layers = nn.Sequential(
nn.Linear(28 * 28, 64),
nn.ReLU(),
nn.Linear(64, 32),
nn.ReLU(),
nn.Linear(32, 10),
nn.Sigmoid()
)
self.bce = nn.BCELoss()
def forward(self, x):
return self.layers(x)
def training_step(self, batch, batch_idx):
x, y = batch
x = x.view(x.size(0), -1)
y_hat = self.layers(x)
loss = self.bce(y_hat, y)
self.log('train_loss', loss)
return loss
def configure_optimizers(self):
optimizer = torch.optim.Adam(self.parameters(), lr=1e-4)
return optimizer
if __name__ == '__main__':
dataset = MNIST(os.getcwd(), download=True, transform=transforms.ToTensor())
pl.seed_everything(42)
neuralnetwork = MNISTNetwork()
trainer = pl.Trainer(auto_scale_batch_size='power', gpus=1, deterministic=True)
trainer.fit(neuralnetwork, DataLoader(dataset))
In PyTorch Ignite, we can also add Binary Crossentropy loss quite easily. Here, we have to specify it as a criterion
in the Trainer. Like with classic PyTorch and Lightning, we can use nn.BCELoss
for this purpose. Adding BCE loss can be done as follows:
from torch import nn
criterion = nn.BCELoss()
trainer = create_supervised_trainer(model, optimizer, criterion, device=device)
That's it for today! Now that you have completed this tutorial, you know how to implement Binary Crossentropy Loss with PyTorch, PyTorch Lightning and PyTorch Ignite. If you have any comments, please feel free to leave a message in the comments section below 💬 Please do the same if you have any questions, or ask your question here.
Thank you for reading MachineCurve today and happy engineering! 😎
PyTorch Ignite. (n.d.). Ignite your networks! — ignite master documentation. PyTorch. https://pytorch.org/ignite/
PyTorch Lightning. (2021, January 12). https://www.pytorchlightning.ai/
PyTorch. (n.d.). https://pytorch.org
PyTorch. (n.d.). BCELoss — PyTorch 1.7.0 documentation. https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html
Learn how large language models and other foundation models are working and how you can train open source ones yourself.
Keras is a high-level API for TensorFlow. It is one of the most popular deep learning frameworks.
Read about the fundamentals of machine learning, deep learning and artificial intelligence.
To get in touch with me, please connect with me on LinkedIn. Make sure to write me a message saying hi!
The content on this website is written for educational purposes. In writing the articles, I have attempted to be as correct and precise as possible. Should you find any errors, please let me know by creating an issue or pull request in this GitHub repository.
All text on this website written by me is copyrighted and may not be used without prior permission. Creating citations using content from this website is allowed if a reference is added, including an URL reference to the referenced article.
If you have any questions or remarks, feel free to get in touch.
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation.
Montserrat and Source Sans are fonts licensed under the SIL Open Font License version 1.1.
Mathjax is licensed under the Apache License, Version 2.0.