TensorFlow (Beginner) - Basic Image Classification

In this project, we will learn the basics of using Keras - with TensorFlow as its backend - and we will use the framework to solve a basic image classification problem. By the end of the project, you’d have created and trained a Neural Network model that, after the training, will be able to predict digits from hand-written images with a high degree of accuracy and along the way, you’d have developed a basic understanding of how Neural Networks work and you’d have developed a basic understanding of TensorFlow syntax with Keras as its front end.

Join for Free
TensorFlow (Beginner) - Basic Image Classification

Duration (mins)


4.8 / 5


Task List

We will cover the following tasks in 1 hour and 10 minutes:


TensorFlow is an open source machine learning library. It is one of the most popular and widely used machine learning library at the moment. However, working with TensorFlow may seem a bit challenging at first because you’d need to understand a lot of underlying ideas on how computations with Neural Networks work. While it’s great to know those details, getting started with Neural Networks directly with writing computational graphs can be a bit intimidating.

This is where Keras comes in. Keras is a high-level API which can use TensorFlow as its backend and provide the users with a simple to use interface. Keras does the heavy lifting behind the scenes, leaving developers to focus on just the high level details. Some developers may or may not want to use Keras in a production setting but for testing out ideas quickly, it’s an amazing tool.

The Dataset

In order to understand our problem better, we will first import the data that we’d be working with and take a closer look at it. We are going to use the popular MNIST dataset which has lots of images of hand-written digits along with their labels.

So, we have 60000 examples for the training set and 10000 examples for the test set. You will notice that each input x is of the shape (28, 28). What this means is that, for each example, there are 28 rows and 28 columns. Fortunately, we can simply print out these examples using the Pyplot module from Matplotlib.

One Hot Encoding

We will change the way this label is represented from a class name or number to a list of all possible classes with all the classes set to 0 except the one which this example belongs to - which will be set to 1.

So, now it’s as if our Neural Network will predict which switch is ON out of all the 10 switches instead of trying to predict an actual numeric value. This makes it a classification problem. If we were to try and predict an actual number, like 5 or 7 etc., it would be a regression problem instead whereas we are trying to classify the image examples in our case.

Neural Networks

In a given network example, we have two hidden layers. The first layer with all the X features is called the input layer and the output y is called the output layer. In this example above, the output has only one “node”. The hidden layer can have a lot of nodes or a very few nodes depending on how complex the problem may be. Here, the both hidden layer have 2 nodes each. Each node is an output of a linear function which takes inputs from the nodes of the preceding layer. All the Ws and all the bs associated with all of these linear functions will have to be “learned” by our algorithm as it attempts to optimise those values in order to best fit the given data. In the hand-written digit classification problem, we will have 128 nodes for two hidden layers and of course we already know that the input is a 784 dimensional vector.

Preprocessing the Examples

We will create a Neural Network which will take 784 dimensional vectors as inputs (28 rows * 28 columns) and will output a 10 dimensional vector (For the 10 classes). We have already converted the outputs to 10 dimensional, one-hot encoded vectors. Now, let’s convert the input to the required format as well. We will use numpy to easily unroll the examples from (28, 28) arrays to (784, 1) vectors.

Pixel values, in this dataset, range from 0 to 255. While that’s fine if we want to display our images, for our neural network to learn the weights and biases for different layers, computations will be simply much more effective and fast if we normalised these values. In one of the future projects, we will take a look at how this normalisation affects the speed of learning.

Creating a Model

Creating a Neural Network model with the help of Keras is really simple. We simply use a Sequential class defined in Keras, and add some layers to it. As discussed before, we will use two hidden layers with 128 nodes each and one output layer with 10 nodes for the 10 classes. All the layers are going to be Dense layers. This means, like our examples above, all the nodes of a layer would be connected to all the nodes of the preceding layer i.e. densely connected.

We are instantiating a Sequential model. We pass on a list of layers that we want in our model, in the order that we want them. So, we have two hidden layers with 128 nodes each and one output layer with 10 nodes. We set the input shape on the first hidden layer to correspond to the shape of a single example from our reshaped training and test sets - we know each example is a 784 dimensional vector for the 784 pixels of the images.

Training the Model

Let’s train the model now. We will use our training set which has been normalised and reshaped! Also, we are going to train the model for 5 epochs. Think of epoch like an iteration of all the examples going through the model. So, by setting the epochs to 5, we will go through all the training examples 5 times.

We get a training set accuracy of over 98%. While this is probably not as good as a human level performance, it still seems quite good for a machine. But, in order to ensure that this is not a simple “memorization” by the machine, we should evaluate the performance on the test set. This is easy to do, we simply use the evaluate method on our model.


Let’s get our model’s predictions on the test dataset. Each prediction is a list of probability scores as we expected from our softmax output. What we are interested in, is the index of the highest probability score in each prediction. We can use numpy’s argmax function to do this.

We have a total of 10000 predictions. We probably can’t go through all the 10000 predictions for now, but we can take a look at the first few. Let’s plot the first few test set images along with their predicted and actual labels and see how our trained model actually performed.

Watch Preview

Preview the instructions that you will follow along in a hands-on session in your browser.



John Sutherland
John Sutherland

Some significant lag and errors coded due to remote desktop and video issues. Might have also been caused by wifi connection (or lack thereof).

Chris Barfuss
Chris Barfuss

Excellent. Very easy to do exercise.

Thet oo zin
Thet oo zin

You are the best. Rhyme


Kind of laggy and doesn't accept keyboard shortcuts.

Shahzeb Lakhani
Shahzeb Lakhani

A hybrid approach might be best. Watching videos might be better when a new concept is introduced, compared to static images in the notebook. But when it comes to hands-on, this approach is definitely revolutionary, a genuine breakthrough in code-related learning.

Christos Glymidakis
Christos Glymidakis

Both windows don't fit into the screen, have to reduce the size to the degree at which it was difficult to read. Another problem, after maximizing one of the window, the previous setting is lost, and I had to re-arrange again. Finally, the video stops playing if I open another app, e.g. terminal window or email. Annoying.

Irina Gruzinov
Irina Gruzinov

Great! Scaling down the icons would give more real estate for fonts. Add note ability. Notetaking similar to and inegrated into Coursera would be very useful.

Paul Kubicz
Paul Kubicz

It went great, hope the results will successfully transfer to Coursera.

Alina-Oana Gârlea
Alina-Oana Gârlea

I found this is and ride, though it is my first time here.

Kasi Ponnapalli
Kasi Ponnapalli

I liked it. Will take more classes.

Igor Tur
Igor Tur
Amit Yadav

About the Host (Amit Yadav)

I am a machine learning engineer with focus in computer vision and sequence modelling for automated signal processing using deep learning techniques. My previous experiences include leading chatbot development for a large corporation.

Frequently Asked Questions

In Rhyme, all projects are completely hands-on. You don't just passively watch someone else. You use the software directly while following the host's (Amit Yadav) instructions. Using the software is the only way to achieve mastery. With the "Live Guide" option, you can ask for help and get immediate response.
Nothing! Just join through your web browser. Your host (Amit Yadav) has already installed all required software and configured all data.
Absolutely! Your host (Amit Yadav) has provided this session completely free of cost!
You can go to https://rhyme.com, sign up for free, and follow this visual guide How to use Rhyme to create your own projects. If you have custom needs or company-specific environment, please email us at help@rhyme.com
Absolutely. We offer Rhyme for workgroups as well larger departments and companies. Universities, academies, and bootcamps can also buy Rhyme for their settings. You can select projects and trainings that are mission critical for you and, as well, author your own that reflect your own needs and tech environments. Please email us at help@rhyme.com
Rhyme strives to ensure that visual instructions are helpful for reading impairments. The Rhyme interface has features like resolution and zoom that will be helpful for visual impairments. And, we are currently developing a close-caption functionality to help with hearing impairments. Most of the accessibility options of the cloud desktop's operating system or the specific application can also be used in Rhyme. If you have questions related to accessibility, please email us at accessibility@rhyme.com
We started with windows and linux cloud desktops because they have the most flexibility in teaching any software (desktop or web). However, web applications like Salesforce can run directly through a virtual browser. And, others like Jupyter and RStudio can run on containers and be accessed by virtual browsers. We are currently working on such features where such web applications won't need to run through cloud desktops. But, the rest of the Rhyme learning, authoring, and monitoring interfaces will remain the same.
Please email us at help@rhyme.com and we'll respond to you within one business day.

Ready to join this 1 hour and 10 minutes session for free?

More Projects by Amit Yadav