Object detection is one of the areas in Deep Learning where much progress has been made. Using a variety of models, we can detect objects in photos and - by consequence - also in videos. Even real-time object detection using webcam images is a common thing these days!
In this tutorial, we will build an object detection system with TensorFlow. Specifically, we will be using the TensorFlow Object Detection API. In a step-by-step fashion, you will install all the necessary dependencies, take a look at pretrained models in the TensorFlow Model Zoo, and build the object detector.
In other words, after reading this tutorial, you will...
And because images always say more than 1.000 words, you will create a system that can do this:
Let's take a look! π₯
With this fully functional example of object detection with TensorFlow, you can get started quickly. If you want to understand everything in more detail, make sure to read the rest of this tutorial below. We're going to walk through every step, so that you'll understand exactly how to build such a system yourself. Good luck!
β Pay attention to the following things when running this example straight away:
os.environ(...)
call if you want to run the code on your GPU. Of course, this only works if your TensorFlow is GPU-enabled.# Specify model imports
from object_detection.builders import model_builder
from object_detection.utils import config_util
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as viz_utils
import cv2
import numpy as np
import os
import tensorflow as tf
# Disable GPU if necessary
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
# Create object detector
class TFObjectDetector():
# Constructor
def __init__(self, path_to_object_detection = './models/research/object_detection/configs/tf2',\
path_to_model_checkpoint = './checkpoint', path_to_labels = './labels.pbtxt',\
model_name = 'ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8'):
self.model_name = model_name
self.pipeline_config_path = path_to_object_detection
self.pipeline_config = os.path.join(f'{self.pipeline_config_path}/{self.model_name}.config')
self.full_config = config_util.get_configs_from_pipeline_file(self.pipeline_config)
self.path_to_model_checkpoint = path_to_model_checkpoint
self.path_to_labels = path_to_labels
self.setup_model()
# Set up model for usage
def setup_model(self):
self.build_model()
self.restore_checkpoint()
self.detection_function = self.get_model_detection_function()
self.prepare_labels()
# Build detection model
def build_model(self):
model_config = self.full_config['model']
assert model_config is not None
self.model = model_builder.build(model_config=model_config, is_training=False)
return self.model
# Restore checkpoint into model
def restore_checkpoint(self):
assert self.model is not None
self.checkpoint = tf.train.Checkpoint(model=self.model)
self.checkpoint.restore(os.path.join(self.path_to_model_checkpoint, 'ckpt-0')).expect_partial()
# Get a tf.function for detection
def get_model_detection_function(self):
assert self.model is not None
@tf.function
def detection_function(image):
image, shapes = self.model.preprocess(image)
prediction_dict = self.model.predict(image, shapes)
detections = self.model.postprocess(prediction_dict, shapes)
return detections, prediction_dict, tf.reshape(shapes, [-1])
return detection_function
# Prepare labels
# Source: https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/inference_tf2_colab.ipynb
def prepare_labels(self):
label_map = label_map_util.load_labelmap(self.path_to_labels)
categories = label_map_util.convert_label_map_to_categories(
label_map,
max_num_classes=label_map_util.get_max_label_map_index(label_map),
use_display_name=True)
self.category_index = label_map_util.create_category_index(categories)
self.label_map_dict = label_map_util.get_label_map_dict(label_map, use_display_name=True)
# Get keypoint tuples
# Source: https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/inference_tf2_colab.ipynb
def get_keypoint_tuples(self, eval_config):
tuple_list = []
kp_list = eval_config.keypoint_edge
for edge in kp_list:
tuple_list.append((edge.start, edge.end))
return tuple_list
# Prepare image
def prepare_image(self, image):
return tf.convert_to_tensor(
np.expand_dims(image, 0), dtype=tf.float32
)
# Perform detection
def detect(self, image, label_offset = 1):
# Ensure that we have a detection function
assert self.detection_function is not None
# Prepare image and perform prediction
image = image.copy()
image_tensor = self.prepare_image(image)
detections, predictions_dict, shapes = self.detection_function(image_tensor)
# Use keypoints if provided
keypoints, keypoint_scores = None, None
if 'detection_keypoints' in detections:
keypoints = detections['detection_keypoints'][0].numpy()
keypoint_scores = detections['detection_keypoint_scores'][0].numpy()
# Perform visualization on output image/frame
viz_utils.visualize_boxes_and_labels_on_image_array(
image,
detections['detection_boxes'][0].numpy(),
(detections['detection_classes'][0].numpy() + label_offset).astype(int),
detections['detection_scores'][0].numpy(),
self.category_index,
use_normalized_coordinates=True,
max_boxes_to_draw=25,
min_score_thresh=.40,
agnostic_mode=False,
keypoints=keypoints,
keypoint_scores=keypoint_scores,
keypoint_edges=self.get_keypoint_tuples(self.full_config['eval_config']))
# Return the image
return image
# Predict image from folder
def detect_image(self, path, output_path):
# Load image
image = cv2.imread(path)
# Perform object detection and add to output file
output_file = self.detect(image)
# Write output file to system
cv2.imwrite(output_path, output_file)
# Predict video from folder
def detect_video(self, path, output_path):
# Set output video writer with codec
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter(output_path, fourcc, 25.0, (1920, 1080))
# Read the video
vidcap = cv2.VideoCapture(path)
frame_read, image = vidcap.read()
count = 0
# Iterate over frames and pass each for prediction
while frame_read:
# Perform object detection and add to output file
output_file = self.detect(image)
# Write frame with predictions to video
out.write(output_file)
# Read next frame
frame_read, image = vidcap.read()
count += 1
# Release video file when we're ready
out.release()
if __name__ == '__main__':
detector = TFObjectDetector('../../tf-models/research/object_detection/configs/tf2', './checkpoint', './labels.pbtxt', 'ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8')
detector.detect_image('./1.jpg', './1o.jpg')
detector.detect_video('./1v.mp4', './v1o.mp4')
In order to build an object detection system with the TensorFlow Object Detection API, you will need to complete the following three steps:
The first step to complete before we actually build the object detector is installing TensorFlow and OpenCV.
Here, we assume that you have Python installed on your system already. If not, make sure to install that as well.
Installing TensorFlow is really easy these days. Run the following two commands from within a terminal that has access to Python:
# Requires the latest pip
pip install --upgrade pip
# Current stable release for CPU and GPU
pip install tensorflow
It first upgrades pip
to the latest version and then installs TensorFlow. Whereas previously you had to specify manually whether you wanted the CPU or GPU version, this is no longer the case today. Simply install tensorflow
and the GPU version will install itself if you have your GPU setup correctly. In fact, you'll be able to switch back and forth between GPU and CPU if you want, but we'll get back to that later.
Installing OpenCV is neither difficult: pip install opencv-python
should do the trick.
Now that you have the base packages installed, we can take a look at the TensorFlow Object Detection API.
A dog... oh wow πΆπ
On GitHub, specifically in tensorflow/models
, you can find the Object Detection API:
The TensorFlow Object Detection API is an open source framework built on top of TensorFlow that makes it easy to construct, train and deploy object detection models.
TensorFlow (n.d.)
As the name suggests, it can be used for object detection purposes. In particular, it offers functionality to load pretrained models and to add bounding boxes to images and videos. This is great, as our object detection system can use these APIs meaning that we don't have to develop everything ourselves.
We'll take a look at the pretrained models later. Let's install the Object Detection API first. This assumes that you have Git installed onto your system. Also ensure that you can run the protoc
command: find here how.
tensorflow/models
repository. Make sure to clone one level deep only. Execute this command for cloning the repository: git clone --depth 1 https://github.com/tensorflow/models
cd models/research/
and execute protoc object_detection/protos/*.proto --python_out=.
cp object_detection/packages/tf2/setup.py .
pip
via python -m pip install .
Our object detection system will be built on top of a TensorFlow model that is capable of detecting objects - so far no surprise. Training such a model involves the following steps:
This takes quite a lot of work that you likely don't want to spend. Fortunately, the folks at TensorFlow have made available a variety of pretrained object detection models in the TensorFlow Detection Model Zoo.
We provide a collection of detection models pre-trained on the COCO 2017 dataset. These models can be useful for out-of-the-box inference if you are interested in categories already in those datasets.
TensorFlow (n.d.)
These object detectors have been pretrained and are available in the TensorFlow Object Detection API (with the underlying model architectures written in parentheses):
Of course, you can also choose to build your own - but that's a more advanced use case not covered by this tutorial.
Today, we're going to use the SSD MobileNet V2 FPNLite 640x640 model. You can literally choose any model from the Zoo, but this pretrained model is only 20MB and can therefore be downloaded by many people with different internet speeds.
Let's now build that detector of ours!
Here, we're going to take a look at building the object detection system. Doing so can be split up in three separate but sequential parts:
Make sure to open your code editor and create a Python file, e.g. objectdetector.py
. Let's write some code! π
Recall that the TensorFlow Object Detection API is a framework on top of TensorFlow for building object detectors. In other words, it's a layer on top of a well-known library for creating machine learning models. We're going to add another layer on top of this API, being an object detector layer that can use the Object Detection API.
Creating the foundation of this TFObjectDetector
involves adding the Python imports, disabling the GPU if necessary, creating the TFObjectDetector
class and initializing it, writing the setup mechanism for the object detector, and finally creating some helper functions.
The first code always involves Python imports, and today is not different:
# Specify model imports
from object_detection.builders import model_builder
from object_detection.utils import config_util
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as viz_utils
import cv2
import numpy as np
import os
import tensorflow as tf
As you can see, we import many functions from the object_detection
package - which represents the TensorFlow Object Detection API. We'll use model_builder
for building the detection model (i.e. the SSD MobileNet model). With config_util
, we can load the configuration which tells TensorFlow to load the correct model. The labels representing the class names can be loaded with the label_map_util
, and viz_utils
will be useful for adding the bounding boxes to the image or the video.
OpenCV (cv2
) will be used for image input/output, NumPy (np
) for numbers processing, os
for operating system functions, and finally we import TensorFlow as well.
The second step is to disable the GPU, but this is optional - in other words, only if you want to. Especially when you have a GPU but when it is misconfigured, this can be useful. You then simply have to erase all CUDA visible devices from the visible environment. If you don't use the GPU version of TensorFlow, this code can be omitted.
# Disable GPU if necessary
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
Now, it's time for the real work. Let's create a class called TFObjectDetector
which covers all the functionalities of our object detector.
# Create object detector
class TFObjectDetector():
We immediately add the __init__
definition which represents the constructor of the class. In other words, it is executed immediately upon object creation (in plainer English, loading the TFObjectDetector
). Note that it takes the following inputs:
β We'll cover setting the inputs to values for your situation later, when we will actually use the detector.
In the constructor, we do quite a few things. Firstly, we fill a lot of instance variables so that our inputs can be reused throughout the detector. We also load the pipeline configuration that is available in the Object Detection API folder, specifically the one for our model. We also load the full configuration and finally call self.setup_model()
.
This starts the setup mechanism for our model, which we'll take a look at now.
# Create object detector
class TFObjectDetector():
# Constructor
def __init__(self, path_to_object_detection = './models/research/object_detection/configs/tf2',\
path_to_model_checkpoint = './checkpoint', path_to_labels = './labels.pbtxt',\
model_name = 'ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8'):
self.model_name = model_name
self.pipeline_config_path = path_to_object_detection
self.pipeline_config = os.path.join(f'{self.pipeline_config_path}/{self.model_name}.config')
self.full_config = config_util.get_configs_from_pipeline_file(self.pipeline_config)
self.path_to_model_checkpoint = path_to_model_checkpoint
self.path_to_labels = path_to_labels
self.setup_model()
The setup mechanism is responsible for setting up the model in the background and making our object detector ready for usage. It involves the following steps:
__init__
function.tf.function
that can be used for generating predictions.Let's group the execution of these steps in the setup_model()
definition. Recall that this definition is called in the __init__
definition specified above and thus at the creation of our object detector.
# Set up model for usage
def setup_model(self):
self.build_model()
self.restore_checkpoint()
self.detection_function = self.get_model_detection_function()
self.prepare_labels()
We can next create build_model()
:
# Build detection model
def build_model(self):
model_config = self.full_config['model']
assert model_config is not None
self.model = model_builder.build(model_config=model_config, is_training=False)
return self.model
This definition retrieves the configuration, ensures that it exists and builds the model. It assigns the model to the instance variables so that it can be reused across our object detector.
With restore_checkpoint()
, we can set the model back to the checkpointed position / state provided by the TensorFlow Detection Model Zoo.
# Restore checkpoint into model
def restore_checkpoint(self):
assert self.model is not None
self.checkpoint = tf.train.Checkpoint(model=self.model)
self.checkpoint.restore(os.path.join(self.path_to_model_checkpoint, 'ckpt-0')).expect_partial()
We can then generate a tf.function
for detection. This function utilizes our model, preprocesses the image, generates the prediction, postprocesses the detections and returns everything.
# Get a tf.function for detection
def get_model_detection_function(self):
assert self.model is not None
@tf.function
def detection_function(image):
image, shapes = self.model.preprocess(image)
prediction_dict = self.model.predict(image, shapes)
detections = self.model.postprocess(prediction_dict, shapes)
return detections, prediction_dict, tf.reshape(shapes, [-1])
return detection_function
Finally, we generate a definition called prepare_labels()
. Note that it was created by the people at TensorFlow and that it is responsible for mapping class identifiers to textual labels. It sets these to the instance variables.
# Prepare labels
# Source: https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/inference_tf2_colab.ipynb
def prepare_labels(self):
label_map = label_map_util.load_labelmap(self.path_to_labels)
categories = label_map_util.convert_label_map_to_categories(
label_map,
max_num_classes=label_map_util.get_max_label_map_index(label_map),
use_display_name=True)
self.category_index = label_map_util.create_category_index(categories)
self.label_map_dict = label_map_util.get_label_map_dict(label_map, use_display_name=True)
So far, we have created a foundation that is capable of preparing the object detector. We only need to create two more helper functions to finish this part. The first restructures keypoint tuples and the second one prepares the image, i.e. converting it into a Tensor.
# Get keypoint tuples
# Source: https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/inference_tf2_colab.ipynb
def get_keypoint_tuples(self, eval_config):
tuple_list = []
kp_list = eval_config.keypoint_edge
for edge in kp_list:
tuple_list.append((edge.start, edge.end))
return tuple_list
# Prepare image
def prepare_image(self, image):
return tf.convert_to_tensor(
np.expand_dims(image, 0), dtype=tf.float32
)
Wohoo, we're at part 2 already! In this part, we'll write the detection functions. More specifically, we create three definitions:
The first definition is the general detection function. General here means that it contains the detection functionality shared across detecting on images and on videos. In other words, things that would be pointless to add twice! It contains the following segments:
is not None
, meaning that it must be set or we can't perform detection.detections
, a dictionary with the predictions
, and an object containing shapes
information.viz_utils
APIs provided by the Object Detection API. # Perform detection
def detect(self, image, label_offset = 1):
# Ensure that we have a detection function
assert self.detection_function is not None
# Prepare image and perform prediction
image = image.copy()
image_tensor = self.prepare_image(image)
detections, predictions_dict, shapes = self.detection_function(image_tensor)
# Use keypoints if provided
keypoints, keypoint_scores = None, None
if 'detection_keypoints' in detections:
keypoints = detections['detection_keypoints'][0].numpy()
keypoint_scores = detections['detection_keypoint_scores'][0].numpy()
# Perform visualization on output image/frame
viz_utils.visualize_boxes_and_labels_on_image_array(
image,
detections['detection_boxes'][0].numpy(),
(detections['detection_classes'][0].numpy() + label_offset).astype(int),
detections['detection_scores'][0].numpy(),
self.category_index,
use_normalized_coordinates=True,
max_boxes_to_draw=25,
min_score_thresh=.40,
agnostic_mode=False,
keypoints=keypoints,
keypoint_scores=keypoint_scores,
keypoint_edges=self.get_keypoint_tuples(self.full_config['eval_config']))
# Return the image
return image
Detecting objects on any image is now easy. It simply involves reading the image from a path
with OpenCV, calling the general detection definition, and writing the output to the output_path
.
# Predict image from folder
def detect_image(self, path, output_path):
# Load image
image = cv2.imread(path)
# Perform object detection and add to output file
output_file = self.detect(image)
# Write output file to system
cv2.imwrite(output_path, output_file)
Detecting objects on a video is a bit more difficult, but also still pretty easy. Recall that a video is nothing more than a set of images, often with 25 frames - and thus images - per second of video. We will use that characteristic when performing object detection on videos!
This segment is composed of the following steps:
path
using OpenCV's VideoCapture
functionality.vidcap.read()
, we read the first frame (image
) and indicate whether we read it successfully. We also set the frame count
to zero.frame_read != True
).out.release()
. # Predict video from folder
def detect_video(self, path, output_path):
# Set output video writer with codec
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter(output_path, fourcc, 25.0, (1920, 1080))
# Read the video
vidcap = cv2.VideoCapture(path)
frame_read, image = vidcap.read()
count = 0
# Iterate over frames and pass each for prediction
while frame_read:
# Perform object detection and add to output file
output_file = self.detect(image)
# Write frame with predictions to video
out.write(output_file)
# Read next frame
frame_read, image = vidcap.read()
count += 1
# Release video file when we're ready
out.release()
Parts 1 and 2 conclude the creation of our TFObjectDetector
class and hence our object detector. Now that we have finished it, it's time to call it. We can do so with the following code.
if __name__ == '__main__':
detector = TFObjectDetector('../../tf-models/research/object_detection/configs/tf2', './checkpoint', './labels.pbtxt', 'ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8')
detector.detect_image('./shop.jpg', './shopout.jpg')
detector.detect_video('./video.mp4', './videooutput.mp4')
This code does the following:
TFObjectDetector
. Here, we pass the following information:tf2
config folder of your tensorflow/models
cloned GitHub repository../checkpoint
folder. Refer there../shop.jpg
, storing the output (i.e. the image with overlaying bounding boxes) at ./shopout.jpg
../video.mp4
with output at ./videooutput.mp4
.Let's now take a look at some results of running the object detector.
These photos and videos have been downloaded and used under a Pexels License.
There are many use cases for object detection in Machine Learning. In this tutorial, you have learned how you can build an object detection system yourself. Using the TensorFlow Object Detection API and a pretrained model, you have been able to perform object detection on images and on videos.
I hope that you have learned something from this article π¬ If you did, please feel free to leave a message in the comments section below. Please do the same if you have any questions, or click the Ask Questions button on the right.
Thank you for reading MachineCurve today and happy engineering! π
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
TensorFlow. (2020, September 9).Β TensorFlow/models. GitHub.Β https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md
TensorFlow. (2020, 11).Β TensorFlow/models. GitHub.Β https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/inference_tf2_colab.ipynb
TensorFlow. (n.d.).Β TensorFlow/models. GitHub.Β https://github.com/tensorflow/models/tree/master/research/object_detection
"Speed/accuracy trade-offs for modern convolutional object detectors." Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, Song Y, Guadarrama S, Murphy K, CVPR 2017
Learn how large language models and other foundation models are working and how you can train open source ones yourself.
Keras is a high-level API for TensorFlow. It is one of the most popular deep learning frameworks.
Read about the fundamentals of machine learning, deep learning and artificial intelligence.
To get in touch with me, please connect with me on LinkedIn. Make sure to write me a message saying hi!
The content on this website is written for educational purposes. In writing the articles, I have attempted to be as correct and precise as possible. Should you find any errors, please let me know by creating an issue or pull request in this GitHub repository.
All text on this website written by me is copyrighted and may not be used without prior permission. Creating citations using content from this website is allowed if a reference is added, including an URL reference to the referenced article.
If you have any questions or remarks, feel free to get in touch.
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation.
Montserrat and Source Sans are fonts licensed under the SIL Open Font License version 1.1.
Mathjax is licensed under the Apache License, Version 2.0.