TensorFlow (Beginner): Avoid Overfitting Using Regularization

In this course, we will learn how to avoid overfitting by using two common regularization techniques: Weight Regularization and Dropouts. We will see how to apply these techniques with TensorFlow. The idea of regularization is to put constraints on the quanitity and type of information that the mode can store. If a network can only afford to memorize a small number of patterns, the optimization process will force it to focus on the most prominent patterns and simplifying the model.

Duration (mins)

Learners

5.0 / 5

Rating

We will cover the following tasks in 1 hour and 13 minutes:

Introduction

We will understand the Rhyme interface and our learning environment. You will get a virtual machine, you will need Jupyter Notebook and TensorFlow for this course and both of these are already installed on your virtual machine. Jupyter Notebooks are very popular with Data Science and Machine Learning Engineers as one can write code in cells and use other cells for documentation.

What is Overfitting?

Overfitting is when the accuracy of the model on the training data would either keep increasing or remain constant with more epochs but the accuracy of the model on the validation data would peak after training for a certain number of epochs and then it will start decreasing. Let’s import the libraries that we will need. We will use TensorFlow and Keras. We will also use the fundamental package for scientific computing in Python - NumPy.

Dataset

We are going to use the IMDB movie reviews from the Internet Movie Database as our dataset. These are split into 25k for training and 25k reviews for testing. The training and testing sets consists of equal numbers of positive and negative reviews. The dataset is pre-processed where each example is an array of integers representing the words of the movie review. Each label is an integer value of either `0` (negative sentiment) or `1` (positive sentiment). We will Multi-Hot-Encode our data. In the dataset, every example has a list of words represented by numbers that exist in that example. We are going to work with only the 10000 most common words in our entire dictionary.

Creating the Baseline Model

To find an appropriate size for our model, we normally start with relatively few layers and parameters, then begin increasing the size of the layers or adding new layers until we see diminishing returns on the validation loss. We’ll create three models - a baseline model, a smaller version and a larger versions, and then we will compare them.

Creating Model Variants

We will create three models with the same number of layers but with more number of nodes in the first two layers. We will train all the three models using the `fit` method. We will pass on the training data, we will run the training for 20 epochs. We will set the batch size to 512. We will use our test set as the validation set.

Plot History Function

We will define a function to plot the cross entropy against epochs given a history parameter. As usual, we are using Matplotlib’s PyPlot module. Our function will create a plot of `binary cross entropy` against `epochs` given the three history objects that we got from training the three models.

Plotting the Training and Validation Loss

The more capacity a neural network has, the quicker it will be able to model the training data (resulting in a low training loss), but would be more susceptible to overfitting (resulting in a large difference between the training and validation loss). Out of the three models, the smaller one seems to be doing a better job!

Weight Regularization

A common way to mitigate overfitting is to put constraints on the complexity of a network by forcing its weights to take only small values, which makes the distribution of weight values more regular. This is called weight regularization, and is done by adding a cost associated with having large weights to the loss function of the network.

L2 Model vs Baseline

We will look at the impact of regularization on our models. We have two models with the same network architecture. Despite the same architecture, the regularized model is much more resistant to overfitting as we can see from this plot. The L2 model is definitely an improvement over the base model.

Dropouts

Dropouts is a very common regularization technique used in neural networks. Dropout consists of setting random output features of a layer to 0 during training. Let’s say we apply dropout to a layer which would returns a 10 dimensional vector. This will mean that some of these 10 outputs may be randomly set to 0 or dropped out of the vector. The fraction of the values that are being dropped out or are being set to 0 is called Dropout Rate. This type of dropout is only applied during training and not during testing. At the time of testing, a layers output values are scaled down by a factor equal to the dropout rate. This is to balance for the fact that more units are active at testing compared to training.

Dropout Model vs Baseline

Finally, we plot the training and validation loss for the models using Dropout and Weight Regularization. Just like Weight Regularization, Dropout also reduces overfitting. We went through a lot of concepts related to Overfitting and hopefully you now have a pretty good understanding of overfitting and on how to solve overfitting.

Watch Preview

Preview the instructions that you will follow along in a hands-on session in your browser.

I am a Software Engineer with many years of experience in writing commercial software. My current areas of interest include computer vision and sequence modelling for automated signal processing using deep learning as well as developing chatbots.

How is this different from YouTube, PluralSight, Udemy, etc.?
In Rhyme, all projects are completely hands-on. You don't just passively watch someone else. You use the software directly while following the host's (Amit Yadav) instructions. Using the software is the only way to achieve mastery. With the "Live Guide" option, you can ask for help and get immediate response.
Can I buy Rhyme sessions for my company or learning institution?
Absolutely. We offer Rhyme for workgroups as well larger departments and companies. Universities, academies, and bootcamps can also buy Rhyme for their settings. You can select sessions and trainings that are mission critical for you and, as well, author your own that reflect your own needs and tech environments. Please email us at help@rhyme.com
I have a different question
Please email us at help@rhyme.com and we'll respond to you within one business day.

No sessions available

38 minutes
49 minutes
31 minutes
49 minutes
55 minutes
34 minutes
48 minutes
46 minutes
51 minutes
TensorFlow (Beginner): Predicting House Prices
1 hour and 8 minutes
Computer Vision with TensorFlow: Object Classification & Detection
1 hour and 4 minutes
56 minutes
TensorFlow (Beginner): Basic Image Classification
1 hour and 12 minutes