Computer Vision with TensorFlow: Object Classification & Detection

In this course, you will learn to use pre-trained models to be able to make predictions on what object a given image has as well as localize the objects in given input towards the end of the course. This is the first course in my computer vision series and it will cover all the basics you will need to get started with the basics of computer vision with TensorFlow and Keras.

Available On Coursera
Computer Vision with TensorFlow: Object Classification & Detection

Duration (mins)


3.7 / 5


Task List

We will cover the following tasks in 1 hour and 4 minutes:


We will understand the Rhyme interface and our learning environment. You will get a virtual machine, you will need Jupyter Notebook and TensorFlow for this course and both of these are already installed on your virtual machine. Jupyter Notebooks are very popular with Data Science and Machine Learning Engineers as one can write code in cells and use other cells for documentation. Computer vision is a complicated subject but the primary aim is to help machines gain a high-level understanding derived from images and videos. Like the name implies, computer vision is simply helping computer achieve a human-like visual system.

Importing the Libraries

We will be using a pre-trained neural network model called ResNet50 in this course. ResNet50 is a neural network with 50 layers. This is what is normally known as a very deep neural network. This ResNet50 model is already trained on a dataset called ImageNet. This is a very large dataset with millions of images with labels of what objects those images contain. There are thousands of these classes or categories.

Not only we can download this trained ResNet50 model, Keras comes with an existing method to preprocess input data that can be fed to the trained model. We will import this method along with two more methods from Keras’ image preprocessing library. These two methods will help us load images and convert images to NumPy arrays. And of course, we will also need NumPy - which is the fundamental package for scientific computing in python.

Importing the ResNet50 Model

Importing the ResNet50 model is also really straight-forward. Keras provides a simple way of doing that. Once we have imported ResNet50, we will need to create an instance of this model. Weights are internal parameter values that the neural network learned when it was trained on the ImageNet dataset. Setting include top to True means that the first, fully connected layer of the model will be included.

Preparing the Images

We will need to preprocess these images so that they can be used by the ResNet50 model to make predictions on them. We already imported the methods that we would need to do this type of preprocessing in a previous chapter. We will define a function called prepare images. This will take the image paths as a parameter. Essentially, what we need to do here is create an images array by iterating through all the images from our image paths. Then, we will convert the array into a NumPy array. This is because the preprocess input function from ResNet50 expects the images to be laid out in a NumPy array. And finally, we simply return the output of the preprocess input method. This is the output that we will use to feed into our model.

Making Predictions

We have already defined the prepare images method, so we just need to use it now. We will use the predict method available in our model and that will return an array of predictions on the data that we feed the model. However, the predictions are encoded and have probability values for thousands of classes. So, that makes it a bit confusing for us to use. But as you’d expect by now, Keras gives you a method to decode these predictions. Let’s import this method and apply it to the predictions that we just got. From ImageNet utilities, let’s import the method decode predictions. And we are interested in only the topmost prediction for each image. That is, we want to know what our model thinks is the most likely object in a given image.

Object Detection and YOLO

Object Detection is a very interesting task. With our Neural Network models, not only we can detect which objects are present in any given images but we can also ask the models to localize the objects that it finds. This is called object detection. In this chapter, we will use a high level API to perform object detection using the very popular YOLO algorithm.

Object Detection in Images

We will use a ImageAI Library to perform YOLO Object Detection on an image with two objects in it. The model is going to be Tiny YOLO version 3.

Object Detection in Video

In this chapter, we will learn how to use the ImageAI Library to perform object detection in videos! We have a small 10 second clip of cars moving on the road and we will use Tiny YOLO v3 to perform object detection on this clip.

Watch Preview

Preview the instructions that you will follow along in a hands-on session in your browser.

Amit Yadav

About the Host (Amit Yadav)

I am a machine learning engineer with focus in computer vision and sequence modelling for automated signal processing using deep learning techniques. My previous experiences include leading chatbot development for a large corporation.

Frequently Asked Questions

In Rhyme, all projects are completely hands-on. You don't just passively watch someone else. You use the software directly while following the host's (Amit Yadav) instructions. Using the software is the only way to achieve mastery. With the "Live Guide" option, you can ask for help and get immediate response.
Nothing! Just join through your web browser. Your host (Amit Yadav) has already installed all required software and configured all data.
You can go to, sign up for free, and follow this visual guide How to use Rhyme to create your own projects. If you have custom needs or company-specific environment, please email us at
Absolutely. We offer Rhyme for workgroups as well larger departments and companies. Universities, academies, and bootcamps can also buy Rhyme for their settings. You can select projects and trainings that are mission critical for you and, as well, author your own that reflect your own needs and tech environments. Please email us at
Rhyme strives to ensure that visual instructions are helpful for reading impairments. The Rhyme interface has features like resolution and zoom that will be helpful for visual impairments. And, we are currently developing a close-caption functionality to help with hearing impairments. Most of the accessibility options of the cloud desktop's operating system or the specific application can also be used in Rhyme. If you have questions related to accessibility, please email us at
We started with windows and linux cloud desktops because they have the most flexibility in teaching any software (desktop or web). However, web applications like Salesforce can run directly through a virtual browser. And, others like Jupyter and RStudio can run on containers and be accessed by virtual browsers. We are currently working on such features where such web applications won't need to run through cloud desktops. But, the rest of the Rhyme learning, authoring, and monitoring interfaces will remain the same.
Please email us at and we'll respond to you within one business day.

No sessions available

More Projects by Amit Yadav

Linear Regression with Python
Linear Regression with Python
1 hour and 4 minutes
Your First Python Program
Your First Python Program
1 hour and 26 minutes