5.0 / 5
We will cover the following tasks in 56 minutes:
When we train neural network models, you may notice the model performing significantly better on training data as compared to data that it has not seen before, or not trained on before. This means that while we expect the model to learn the underlying patterns from a given dataset, often the model will also memorize the training examples. It will learn to recognise patterns which may be anomolous or may learn the pecularities in the dataset. This phenomenon is called overfitting and it’s a problem because a model which is overfit to the training data will not be able to generalise well to the data that it has not seen before and that sort of defeats the whole point of making the model learn anything at all. We want models which are able to give us predictions as accurately on new data as they can for the training data.
Importing the Data
We will be working with the popular Fashion MNIST dataset in this project. This is readily available with Keras. We can use a handy helper function called load_data to unpack all the examples to two tuples - each with two arrays. The first tuple has the training set and the second one has the test set. The Fashion MNIST dataset has 28 by 28 pixel images. The labels are simply digits from 0 to 9 for the 10 classes in this dataset.
Processing the Data
In this task, we will process our data. First, let’s convert the labels to their one-hot-encoded representations. Currently the labels are digits from 0 to 9 for the 10 classes with each digit representing a unique class. We will convert the labels so that each label is a 10 dimensional label vector. In this one-hot-encoded representation, each dimension in the label vector represents a class. We also need to reshape the examples from 28 x 28 arrays to 784 dimensional vectors. This type of unrolling is done simply to make it easy for us to feed the examples to the neural network models.
Regularization and Dropout
One of the reasons for overfitting is that some of these parameter values can become somewhat large and therefore become too influential on the linear outputs of various hidden units and subsequently become too influential on the non-linear outputs from the activation functions as well. And it can be observed that by regularizing the weights in a way that their values don’t become too large, we can reduce the overfitting. In dropouts, by randomly removing certain nodes in a model, we are forcing the model to NOT assign large values to any particular weights - we are simply forcing the model to NOT rely on any particular weight too much. So, the result is, much like the weight normalization, that the values for weights will be regularized and will not become too large thereby reducing overfitting.
Creating the Experiment Part 1
In this task and in the next one, we will setup a simple experiment. We will create a model, train it on the Fashion MNIST dataset, then we will display its training performance on training set and on the test set which we will use for validation. We will write some functions to help us with running this experiment. We are first writing these functions to make it easier for us to just run the whole experiment in just one go since we need to run the experiment a few times - first for a model without any regularization, then for a model with the two regularization techniques applied. We will write a function to create a Sequential model with a couple of hidden layers, and one output layer.
Creating the Experiment Part 2
Let’s create a function that will use such a history object to display the training accuracy and the validation accuracy. Let’s write one function to actually run the whole experiment. When we run this function it should do a few things:
- Create a model using the create_model function.
- Train the model.
- Display the training and validation accuracies by calling the show_acc function. Additionally, I will setup a simple logger callback because I don’t want to see the whole console log outputs as the model trains, that takes too much space so i’ll set the verbose parameter to False and use the simple logger to display JUST the epoch number for each epoch.
Now that your training is now complete, you should be able to see the training accuracy and the validation accuracy. Your results will be different from mine but the overall image should still be similar. The training accuracy keeps increasing as we train for more epochs and reaches well over 90% but the validation accuracy pretty much remains the same at about 86%. This is a clear case of overfitting. The overfitting problem is solved by using our two regularizers.
About the Host (Amit Yadav)
I am a machine learning engineer with focus in computer vision and sequence modelling for automated signal processing using deep learning techniques. My previous experiences include leading chatbot development for a large corporation.