TensorFlow (Beginner): Avoid Overfitting Using Regularization

Welcome to this project on how to avoid overfitting with regularization. We will take a look at two types of regularization techniques: one is called weight regularization and another one is called regularization with dropouts. We will reduce overfitting in our neural network models by using these regularization techniques.

Available Through Coursera
TensorFlow (Beginner): Avoid Overfitting Using Regularization

Duration (mins)

Learners

5.0 / 5

Rating

Task List


We will cover the following tasks in 56 minutes:


Introduction

When we train neural network models, you may notice the model performing significantly better on training data as compared to data that it has not seen before, or not trained on before. This means that while we expect the model to learn the underlying patterns from a given dataset, often the model will also memorize the training examples. It will learn to recognise patterns which may be anomolous or may learn the pecularities in the dataset. This phenomenon is called overfitting and it’s a problem because a model which is overfit to the training data will not be able to generalise well to the data that it has not seen before and that sort of defeats the whole point of making the model learn anything at all. We want models which are able to give us predictions as accurately on new data as they can for the training data.


Importing the Data

We will be working with the popular Fashion MNIST dataset in this project. This is readily available with Keras. We can use a handy helper function called load_data to unpack all the examples to two tuples - each with two arrays. The first tuple has the training set and the second one has the test set. The Fashion MNIST dataset has 28 by 28 pixel images. The labels are simply digits from 0 to 9 for the 10 classes in this dataset.


Processing the Data

In this task, we will process our data. First, let’s convert the labels to their one-hot-encoded representations. Currently the labels are digits from 0 to 9 for the 10 classes with each digit representing a unique class. We will convert the labels so that each label is a 10 dimensional label vector. In this one-hot-encoded representation, each dimension in the label vector represents a class. We also need to reshape the examples from 28 x 28 arrays to 784 dimensional vectors. This type of unrolling is done simply to make it easy for us to feed the examples to the neural network models.


Regularization and Dropout

One of the reasons for overfitting is that some of these parameter values can become somewhat large and therefore become too influential on the linear outputs of various hidden units and subsequently become too influential on the non-linear outputs from the activation functions as well. And it can be observed that by regularizing the weights in a way that their values don’t become too large, we can reduce the overfitting. In dropouts, by randomly removing certain nodes in a model, we are forcing the model to NOT assign large values to any particular weights - we are simply forcing the model to NOT rely on any particular weight too much. So, the result is, much like the weight normalization, that the values for weights will be regularized and will not become too large thereby reducing overfitting.


Creating the Experiment Part 1

In this task and in the next one, we will setup a simple experiment. We will create a model, train it on the Fashion MNIST dataset, then we will display its training performance on training set and on the test set which we will use for validation. We will write some functions to help us with running this experiment. We are first writing these functions to make it easier for us to just run the whole experiment in just one go since we need to run the experiment a few times - first for a model without any regularization, then for a model with the two regularization techniques applied. We will write a function to create a Sequential model with a couple of hidden layers, and one output layer.


Creating the Experiment Part 2

Let’s create a function that will use such a history object to display the training accuracy and the validation accuracy. Let’s write one function to actually run the whole experiment. When we run this function it should do a few things:

  1. Create a model using the create_model function.
  2. Train the model.
  3. Display the training and validation accuracies by calling the show_acc function. Additionally, I will setup a simple logger callback because I don’t want to see the whole console log outputs as the model trains, that takes too much space so i’ll set the verbose parameter to False and use the simple logger to display JUST the epoch number for each epoch.


Results

Now that your training is now complete, you should be able to see the training accuracy and the validation accuracy. Your results will be different from mine but the overall image should still be similar. The training accuracy keeps increasing as we train for more epochs and reaches well over 90% but the validation accuracy pretty much remains the same at about 86%. This is a clear case of overfitting. The overfitting problem is solved by using our two regularizers.

Watch Preview

Preview the instructions that you will follow along in a hands-on session in your browser.

Amit Yadav

About the Host (Amit Yadav)


I am a machine learning engineer with focus in computer vision and sequence modelling for automated signal processing using deep learning techniques. My previous experiences include leading chatbot development for a large corporation.



Frequently Asked Questions


In Rhyme, all projects are completely hands-on. You don't just passively watch someone else. You use the software directly while following the host's (Amit Yadav) instructions. Using the software is the only way to achieve mastery. With the "Live Guide" option, you can ask for help and get immediate response.
Nothing! Just join through your web browser. Your host (Amit Yadav) has already installed all required software and configured all data.
You can go to https://rhyme.com, sign up for free, and follow this visual guide How to use Rhyme to create your own projects. If you have custom needs or company-specific environment, please email us at help@rhyme.com
Absolutely. We offer Rhyme for workgroups as well larger departments and companies. Universities, academies, and bootcamps can also buy Rhyme for their settings. You can select projects and trainings that are mission critical for you and, as well, author your own that reflect your own needs and tech environments. Please email us at help@rhyme.com
Rhyme strives to ensure that visual instructions are helpful for reading impairments. The Rhyme interface has features like resolution and zoom that will be helpful for visual impairments. And, we are currently developing a close-caption functionality to help with hearing impairments. Most of the accessibility options of the cloud desktop's operating system or the specific application can also be used in Rhyme. If you have questions related to accessibility, please email us at accessibility@rhyme.com
We started with windows and linux cloud desktops because they have the most flexibility in teaching any software (desktop or web). However, web applications like Salesforce can run directly through a virtual browser. And, others like Jupyter and RStudio can run on containers and be accessed by virtual browsers. We are currently working on such features where such web applications won't need to run through cloud desktops. But, the rest of the Rhyme learning, authoring, and monitoring interfaces will remain the same.
Please email us at help@rhyme.com and we'll respond to you within one business day.

No sessions available