Multilayer Perceptrons or MLPs are one of the basic types of neural networks that can be created. Still, they are very important, because they also lie at the basis of more advanced models. If you know that Multilayer Perceptrons are often called feedforward segments in these architectures, you can easily see that they are heavily used in Transformer models as well as in Convolutional Neural Networks.
A basic MLP. License: public domain.
In other words: basic does not mean useless. Quite the contrary, for MLPs.
Today, there are two frameworks that are heavily used for creating neural networks with Python. The first is TensorFlow. This article however provides a tutorial for creating an MLP with PyTorch, the second framework that is very popular these days. It also instructs how to create one with PyTorch Lightning. After reading this tutorial, you will...
Let's get to work! 🚀
Multilayer Perceptrons are straight-forward and simple neural networks that lie at the basis of all Deep Learning approaches that are so common today. Having emerged many years ago, they are an extension of the simple Rosenblatt Perceptron from the 50s, having made feasible after increases in computing power. Today, they are used in many neural networks, sometimes augmented with other layer types as well.
Being composed of layers of neurons that are stacked on top of each other, these networks - which are also called MLP - can be used for a wide variety of purposes, being regression and classification. In this article, we will show you how you can create MLPs with PyTorch and PyTorch Lightning, which are very prominent in today's machine learning and deep learning industry.
First, we'll show two full-fledged examples of an MLP - the first created with classic PyTorch, the second with Lightning.
Defining a Multilayer Perceptron in classic PyTorch is not difficult; it just takes quite a few lines of code. We'll explain every aspect in detail in this tutorial, but here is already a complete code example for a PyTorch created Multilayer Perceptron. If you want to understand everything in more detail, make sure to rest of the tutorial as well. Best of luck! :)
import os
import torch
from torch import nn
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoader
from torchvision import transforms
class MLP(nn.Module):
'''
Multilayer Perceptron.
'''
def __init__(self):
super().__init__()
self.layers = nn.Sequential(
nn.Flatten(),
nn.Linear(32 * 32 * 3, 64),
nn.ReLU(),
nn.Linear(64, 32),
nn.ReLU(),
nn.Linear(32, 10)
)
def forward(self, x):
'''Forward pass'''
return self.layers(x)
if __name__ == '__main__':
# Set fixed random number seed
torch.manual_seed(42)
# Prepare CIFAR-10 dataset
dataset = CIFAR10(os.getcwd(), download=True, transform=transforms.ToTensor())
trainloader = torch.utils.data.DataLoader(dataset, batch_size=10, shuffle=True, num_workers=1)
# Initialize the MLP
mlp = MLP()
# Define the loss function and optimizer
loss_function = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(mlp.parameters(), lr=1e-4)
# Run the training loop
for epoch in range(0, 5): # 5 epochs at maximum
# Print epoch
print(f'Starting epoch {epoch+1}')
# Set current loss value
current_loss = 0.0
# Iterate over the DataLoader for training data
for i, data in enumerate(trainloader, 0):
# Get inputs
inputs, targets = data
# Zero the gradients
optimizer.zero_grad()
# Perform forward pass
outputs = mlp(inputs)
# Compute loss
loss = loss_function(outputs, targets)
# Perform backward pass
loss.backward()
# Perform optimization
optimizer.step()
# Print statistics
current_loss += loss.item()
if i % 500 == 499:
print('Loss after mini-batch %5d: %.3f' %
(i + 1, current_loss / 500))
current_loss = 0.0
# Process is complete.
print('Training process has finished.')
You can also get started with PyTorch Lightning straight away. Here, we provided a full code example for an MLP created with Lightning. Once more: if you want to understand everything in more detail, make sure to read the rest of this tutorial as well! :D
import os
import torch
from torch import nn
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoader
from torchvision import transforms
import pytorch_lightning as pl
class MLP(pl.LightningModule):
def __init__(self):
super().__init__()
self.layers = nn.Sequential(
nn.Linear(32 * 32 * 3, 64),
nn.ReLU(),
nn.Linear(64, 32),
nn.ReLU(),
nn.Linear(32, 10)
)
self.ce = nn.CrossEntropyLoss()
def forward(self, x):
return self.layers(x)
def training_step(self, batch, batch_idx):
x, y = batch
x = x.view(x.size(0), -1)
y_hat = self.layers(x)
loss = self.ce(y_hat, y)
self.log('train_loss', loss)
return loss
def configure_optimizers(self):
optimizer = torch.optim.Adam(self.parameters(), lr=1e-4)
return optimizer
if __name__ == '__main__':
dataset = CIFAR10(os.getcwd(), download=True, transform=transforms.ToTensor())
pl.seed_everything(42)
mlp = MLP()
trainer = pl.Trainer(auto_scale_batch_size='power', gpus=0, deterministic=True, max_epochs=5)
trainer.fit(mlp, DataLoader(dataset))
Created by Wiso at Wikipedia. License: public domain.
I always tend to think that it is good practice if you understand some concepts before you write some code. That's why we'll take a look at the basics of Multilayer Perceptrons, abbreviated as MLPs, in this section. Once completed, we move on and start writing some code with PyTorch and Lightning.
Back in the 1950s, in the era where people had just started using computing technology after they found it really useful, there was a psychologist named Frank Rosenblatt. The man imagined what it would be like to add intelligence to machines - in other words, to make a machine that can think. The result is the Rosenblatt Perceptron - a mathematical operation where some input is passed through a neuron, where weights are memoralized and where the end result is used to optimize the weights. While it can learn a binary classifier, it fell short of learning massively complex functions like thinking and such.
Besides theoretical issues, the absence of sufficient computing power also meant that neural networks could not be utilized massively. Decades later, technological progress made possible the growth into multilayer perceptrons, or MLPs. In these perceptrons, more than just one neuron is used for generating predictions. In addition, neurons are stacked in layers of increasing abstractness, where each layers learns more abstract patterns. That is, while one layer can learn to detect lines, another can learn to detect noses.
In MLPs, the input data is fed to an input layer that shares the dimensionality of the input space. For example, if you feed input samples with 8 features per sample, you'll also have 8 neurons in the input layer. After being processed by the input layer, the results are passed to the next layer, which is called a hidden layer. The final layer is an output. Its neuron structure depends on the problem you are trying to solve (i.e. one neuron in the case of regression and binary classification problems; multiple neurons in a multiclass classification problem).
If you look closely, you can see that each neuron passes the input to all neurons in the subsequent (or downstream) layer. This is why such layers are also called densely-connected, or Dense. In TensorFlow and Keras they are available as tensorflow.keras.layers.Dense
; PyTorch utilizes them as torch.nn.Linear
.
Now that we understand what an MLP looks like, it is time to build one with PyTorch. Below, we will show you how you can create your own PyTorch based MLP with step-by-step examples. In addition to that, we also show you how to build one with PyTorch Lightning. This is a library on top of PyTorch which allows you to build models with much less overhead (for example, by automating away explicitly stating the training loop).
First, we'll show you how to build an MLP with classic PyTorch, then how to build one with Lightning.
Implementing an MLP with classic PyTorch involves six steps:
os
, torch
and torchvision
.nn.Module
.The first step here is to add all the dependencies. We need os
for file input/output functionality, as we will save the CIFAR-10 dataset to local disk later in this tutorial. We'll also import torch
, which imports PyTorch. From it we import nn
, which allows us to define a neural network module. We also import the DataLoader
(for feeding data into the MLP during training), the CIFAR10
dataset (for obvious purposes) and transforms
, which allows us to perform transformations on the data prior to feeding it to the MLP.
import os
import torch
from torch import nn
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoader
from torchvision import transforms
Next up is defining the MLP
class, which replicates the nn.Module
class. This Module class instructs the implementation of our neural network and is therefore really useful when creating one. It has two definitions: __init__
, or the constructor, and forward
, which implements the forward pass.
In the constructor, we first invoke the superclass initialization and then define the layers of our neural network. We stack all layers (three densely-connected layers with Linear
and ReLU activation functions using nn.Sequential
. We also add nn.Flatten()
at the start. Flatten converts the 3D image representations (width, height and channels) into 1D format, which is necessary for Linear
layers. Note that with image data it is often best to use Convolutional Neural Networks. This is out of scope for this tutorial and will be covered in another one.
The forward pass allows us to react to input data - for example, during the training process. In our case, it does nothing but feeding the data through the neural network layers, and returning the output.
class MLP(nn.Module):
'''
Multilayer Perceptron.
'''
def __init__(self):
super().__init__()
self.layers = nn.Sequential(
nn.Flatten(),
nn.Linear(32 * 32 * 3, 64),
nn.ReLU(),
nn.Linear(64, 32),
nn.ReLU(),
nn.Linear(32, 10)
)
def forward(self, x):
'''Forward pass'''
return self.layers(x)
After defining the class, we can move on and write the runtime code. This code is actually executed at runtime, i.e. when you call the Python script from the terminal with e.g. python mlp.py
. The class
itself is then not yet used, but we will do so shortly.
The first thing we define in the runtime code is setting the seed of the random number generator. Using a fixed seed ensures that this generator is initialized with the same starting value. This benefits reproducibility of your ML findings.
if __name__ == '__main__':
# Set fixed random number seed
torch.manual_seed(42)
The next code we add involves preparing the CIFAR-10 dataset. Some samples from this dataset are visualized in the image on the right. The dataset contains 10 classes and has 60.000 32 by 32 pixel images, with 6000 images per class.
Loading and preparing the CIFAR-10 data is a two-step process:
CIFAR10
. Here, in increasing order, you specify the directory where the dataset has to be saved, that it must be downloaded, and that they must be converted into Tensor format.DataLoader
, which takes the dataset, a batch size, shuffle parameter (whether the data must be ordered at random) and the number of workers to load data with. In PyTorch, data loaders are used for feeding data to the model uniformly. # Prepare CIFAR-10 dataset
dataset = CIFAR10(os.getcwd(), download=True, transform=transforms.ToTensor())
trainloader = torch.utils.data.DataLoader(dataset, batch_size=10, shuffle=True, num_workers=1)
Now, it's time to initialize the MLP - and use the class that we had not yet used before. We also specify the loss function (categorical crossentropy loss) and the Adam optimizer. The optimizer works on the parameters of the MLP and utilizes a learning rate of 10e-4
. We'll use them next.
# Initialize the MLP
mlp = MLP()
# Define the loss function and optimizer
loss_function = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(mlp.parameters(), lr=1e-4)
The core part of our runtime code is the training loop. In this loop, we perform the epochs, or training iterations. For every iteration, we iterate over the training dataset, perform the entire forward and backward passes, and perform model optimization.
Step-by-step, these are the things that happen within the loop:
range(0, 5)
.0.0
.trainloader
above). Here, we do the following things:x
and y
values, respectively).mlp
.outputs
of the model and the ground truth, available in targets
. # Run the training loop
for epoch in range(0, 5): # 5 epochs at maximum
# Print epoch
print(f'Starting epoch {epoch+1}')
# Set current loss value
current_loss = 0.0
# Iterate over the DataLoader for training data
for i, data in enumerate(trainloader, 0):
# Get inputs
inputs, targets = data
# Zero the gradients
optimizer.zero_grad()
# Perform forward pass
outputs = mlp(inputs)
# Compute loss
loss = loss_function(outputs, targets)
# Perform backward pass
loss.backward()
# Perform optimization
optimizer.step()
# Print statistics
current_loss += loss.item()
if i % 500 == 499:
print('Loss after mini-batch %5d: %.3f' %
(i + 1, current_loss / 500))
current_loss = 0.0
# Process is complete.
print('Training process has finished.')
For the full model code, see the full code example at the beginning of this tutorial.
Now, when you save the code e.g. to a file called mlp.py
and run python mlp.py
, you'll see the following when your PyTorch has been installed successfully.
Starting epoch 1
Loss after mini-batch 500: 2.232
Loss after mini-batch 1000: 2.087
Loss after mini-batch 1500: 2.004
Loss after mini-batch 2000: 1.963
Loss after mini-batch 2500: 1.943
Loss after mini-batch 3000: 1.926
Loss after mini-batch 3500: 1.904
Loss after mini-batch 4000: 1.878
Loss after mini-batch 4500: 1.872
Loss after mini-batch 5000: 1.874
Starting epoch 2
Loss after mini-batch 500: 1.843
Loss after mini-batch 1000: 1.828
Loss after mini-batch 1500: 1.830
Loss after mini-batch 2000: 1.819
...
Great! 😎
Another approach for creating your PyTorch based MLP is using PyTorch Lightning. It is a library that is available on top of classic PyTorch (and in fact, uses classic PyTorch) that makes creating PyTorch models easier.
The reason is simple: writing even a simple PyTorch model means writing a lot of code. And in fact, writing a lot of code that does nothing more than the default training process (like our training loop above).
In Lightning, these elements are automated as much as possible. In addition, running your code on a GPU does not mean converting your code to CUDA format (which we even haven't done above!). And there are other benefits. Since Lightning is nothing more than classic PyTorch structured differently, there is significant adoption of Lightning. We'll therefore also show you how to create that MLP with Lightning - and you will see that it saves a lot of lines of code.
The first step is importing all dependencies. If you have also followed the classic PyTorch example above, you can see that it is not so different from classic PyTorch. In fact, we use the same imports - os
for file I/O, torch
and its sub imports for PyTorch functionality, but now also pytorch_lightning
for Lightning functionality.
import os
import torch
from torch import nn
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoader
from torchvision import transforms
import pytorch_lightning as pl
In PyTorch Lightning, all functionality is shared in a LightningModule
- which is a structured version of the nn.Module
that is used in classic PyTorch. Here, the __init__
and forward
definitions capture the definition of the model. We specify a neural network with three MLP layers and ReLU activations in self.layers
. We also specify the cross entropy loss in self.ce
. In forward
, we perform the forward pass.
Different in Lightning is that it also requires you to pass the training_step
and configure_optimizers
definitions. This is mandatory because Lightning strips away the training loop. The training_step
allows you to compute the loss (which is then used for optimization purposes under the hood), and for these optimization purposes you'll need an optimizer, which is specified in configure_optimizers
.
That's it for the MLP!
class MLP(pl.LightningModule):
def __init__(self):
super().__init__()
self.layers = nn.Sequential(
nn.Linear(32 * 32 * 3, 64),
nn.ReLU(),
nn.Linear(64, 32),
nn.ReLU(),
nn.Linear(32, 10)
)
self.ce = nn.CrossEntropyLoss()
def forward(self, x):
return self.layers(x)
def training_step(self, batch, batch_idx):
x, y = batch
x = x.view(x.size(0), -1)
y_hat = self.layers(x)
loss = self.ce(y_hat, y)
self.log('train_loss', loss)
return loss
def configure_optimizers(self):
optimizer = torch.optim.Adam(self.parameters(), lr=1e-4)
return optimizer
Since Lightning hides much of the training loop, your runtime code becomes really small!
CIFAR10
, just like with the original example.Trainer
object, which is responsible for automating away much of the training loop, pass configuration options and then fit
the data available in the dataset
through the DataLoader
.if __name__ == '__main__':
dataset = CIFAR10(os.getcwd(), download=True, transform=transforms.ToTensor())
pl.seed_everything(42)
mlp = MLP()
trainer = pl.Trainer(auto_scale_batch_size='power', gpus=1, deterministic=True, max_epochs=5)
trainer.fit(mlp, DataLoader(dataset))
Please do note that automating away the training loop does not mean that you lose all control over the loop. You can still control it if you want by means of your code. This is however out of scope for this tutorial.
For the full model code, see the full code example at the beginning of this tutorial.
Now, when you save the code e.g. to a file called mlp-lightning.py
and run python mlp-lightning.py
, you'll see the following when your PyTorch and PyTorch Lightning have been installed successfully.
``` PU available: True, used: True TPU available: None, using: 0 TPU cores LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
Learn how large language models and other foundation models are working and how you can train open source ones yourself.
Keras is a high-level API for TensorFlow. It is one of the most popular deep learning frameworks.
Read about the fundamentals of machine learning, deep learning and artificial intelligence.
To get in touch with me, please connect with me on LinkedIn. Make sure to write me a message saying hi!
The content on this website is written for educational purposes. In writing the articles, I have attempted to be as correct and precise as possible. Should you find any errors, please let me know by creating an issue or pull request in this GitHub repository.
All text on this website written by me is copyrighted and may not be used without prior permission. Creating citations using content from this website is allowed if a reference is added, including an URL reference to the referenced article.
If you have any questions or remarks, feel free to get in touch.
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation.
Montserrat and Source Sans are fonts licensed under the SIL Open Font License version 1.1.
Mathjax is licensed under the Apache License, Version 2.0.