TensorFlow (Beginner): Avoid Overfitting Using Regularization

In this course, we will learn how to avoid overfitting by using two common regularization techniques: Weight Regularization and Dropouts. We will see how to apply these techniques with TensorFlow. The idea of regularization is to put constraints on the quanitity and type of information that the mode can store. If a network can only afford to memorize a small number of patterns, the optimization process will force it to focus on the most prominent patterns and simplifying the model.

Join for $19.99
TensorFlow (Beginner): Avoid Overfitting Using Regularization

Duration (mins)

Learners

5.0 / 5

Rating

Task List


We will cover the following tasks in 1 hour and 13 minutes:


Introduction

We will understand the Rhyme interface and our learning environment. You will get a virtual machine, you will need Jupyter Notebook and TensorFlow for this course and both of these are already installed on your virtual machine. Jupyter Notebooks are very popular with Data Science and Machine Learning Engineers as one can write code in cells and use other cells for documentation.


What is Overfitting?

Overfitting is when the accuracy of the model on the training data would either keep increasing or remain constant with more epochs but the accuracy of the model on the validation data would peak after training for a certain number of epochs and then it will start decreasing. Let’s import the libraries that we will need. We will use TensorFlow and Keras. We will also use the fundamental package for scientific computing in Python - NumPy.


Dataset

We are going to use the IMDB movie reviews from the Internet Movie Database as our dataset. These are split into 25k for training and 25k reviews for testing. The training and testing sets consists of equal numbers of positive and negative reviews. The dataset is pre-processed where each example is an array of integers representing the words of the movie review. Each label is an integer value of either 0 (negative sentiment) or 1 (positive sentiment). We will Multi-Hot-Encode our data. In the dataset, every example has a list of words represented by numbers that exist in that example. We are going to work with only the 10000 most common words in our entire dictionary.


Creating the Baseline Model

To find an appropriate size for our model, we normally start with relatively few layers and parameters, then begin increasing the size of the layers or adding new layers until we see diminishing returns on the validation loss. We’ll create three models - a baseline model, a smaller version and a larger versions, and then we will compare them.


Creating Model Variants

We will create three models with the same number of layers but with more number of nodes in the first two layers. We will train all the three models using the fit method. We will pass on the training data, we will run the training for 20 epochs. We will set the batch size to 512. We will use our test set as the validation set.


Plot History Function

We will define a function to plot the cross entropy against epochs given a history parameter. As usual, we are using Matplotlib’s PyPlot module. Our function will create a plot of binary cross entropy against epochs given the three history objects that we got from training the three models.


Plotting the Training and Validation Loss

The more capacity a neural network has, the quicker it will be able to model the training data (resulting in a low training loss), but would be more susceptible to overfitting (resulting in a large difference between the training and validation loss). Out of the three models, the smaller one seems to be doing a better job!


Weight Regularization

A common way to mitigate overfitting is to put constraints on the complexity of a network by forcing its weights to take only small values, which makes the distribution of weight values more regular. This is called weight regularization, and is done by adding a cost associated with having large weights to the loss function of the network.


L2 Model vs Baseline

We will look at the impact of regularization on our models. We have two models with the same network architecture. Despite the same architecture, the regularized model is much more resistant to overfitting as we can see from this plot. The L2 model is definitely an improvement over the base model.


Dropouts

Dropouts is a very common regularization technique used in neural networks. Dropout consists of setting random output features of a layer to 0 during training. Let’s say we apply dropout to a layer which would returns a 10 dimensional vector. This will mean that some of these 10 outputs may be randomly set to 0 or dropped out of the vector. The fraction of the values that are being dropped out or are being set to 0 is called Dropout Rate. This type of dropout is only applied during training and not during testing. At the time of testing, a layers output values are scaled down by a factor equal to the dropout rate. This is to balance for the fact that more units are active at testing compared to training.


Dropout Model vs Baseline

Finally, we plot the training and validation loss for the models using Dropout and Weight Regularization. Just like Weight Regularization, Dropout also reduces overfitting. We went through a lot of concepts related to Overfitting and hopefully you now have a pretty good understanding of overfitting and on how to solve overfitting.

Watch Preview

Preview the instructions that you will follow along in a hands on session in your browser.

Amit Yadav

About the Host (Amit Yadav)


I have been writing code since 1993, when I was 11, and my first passion project started with a database management software that I wrote for a local hospital. More recently, I wrote an award winning education Chatbot for a multi-billion-revenue company. I solved a recurrent problem for my client where they wanted to make basic cyber safety and privacy education accessible for their users. This bot enabled my client to reach out to their customers with personalised and real-time education. In the last one year, I’ve continued my interest in this field by constantly learning and growing in Machine Learning, NLP and Deep Learning. I'm very excited to share my variety of experience and learnings with you with the help of Rhyme.com.



Frequently Asked Questions


In Rhyme, all sessions are completely hands on. You don't just passively watch someone else. You use the software directly while following the host's (Amit Yadav) instructions. Using the software is the only way to achieve mastery. With the "Live Guide" option, you can ask for help and get immediate response.
Nothing! Just join through your web browser. Your host (Amit Yadav) has already installed all required software and configured all data.
You can go to https://rhyme.com/for-companies, sign up for free, and follow this visual guide How to use Rhyme to create your own sessions. If you have custom needs or company-specific environment, please email us at help@rhyme.com
Absolutely. We offer Rhyme for workgroups as well larger departments and companies. Universities, academies, and bootcamps can also buy Rhyme for their settings. You can select sessions and trainings that are mission critical for you and, as well, author your own that reflect your own needs and tech environments. Please email us at help@rhyme.com
Please email us at help@rhyme.com and we'll respond to you within one business day.

Ready to join this 1 hour and 13 minutes session?