NA / 5
We will cover the following tasks in 1 hour and 12 minutes:
While using high level APIs are usually preferred, as it can make creating and training models quite straight forward, once you are familiar with the high level APIs, you should also explore the low level TensorFlow core API. Understanding of TensorFlow core can enable you to become a better TensorFlow programmer and it will give you a much better model of how things work internally, behind the scenes, when you are using a higher level API.
The Neural Network Model
There are four steps when you train a neural network model. We will need to implement all four of these:
First, we initialize trainable parameters. These are the weights and biases associated with each layer in the neural network. These are the parameters that our model learns when it is trained.
Then, forward propagation is performed to compute the overall cost of the model. Of course, initially this will be very high and our model’s objective, as it trains, is to reduce this overall cost. Lower cost will mean that the model’s prediction, as computed by forward propagation, will be similar to the actual labels or ground truth given to us in the dataset.
In order to minimize this cost, we must perform backward propagation and compute the gradients of the cost with respect to all the parameters. This gives us slopes for the various gradients and then:
Step 4 is to update the parameter values by subtracting the previous values with the newly computed gradients. This means that the parameter values all take one step in the direction of a potential minima. There are a few more details but these are the important ideas in any gradient descent algorithm. And we will use a variant of the gradient descent in this course.
Instantiate with Layers
Let’s say we want that when we instantiate our model, we pass on a list of nodes for various layers and that should populate some of the these parameters. So, let’s add a few more lines to the initializer method. Now we just need to ensure that when instantiate a model, we pass a list with first element set to number of features that each example in the dataset has - essentially the input to the neural network, the last element is set to number of classes for our multi class classification. And rest of elements in between will represent number of nodes for the hidden layers. For simplicity, our model will only have fully connected layers and will only have the Relu activation for all the hidden layers.
In order to initialize our weights and biases for all the layers, we will have to go through all the layers in a loop and use a normal random distribution for the weights and zeros for biases. This is a very common approach to initialization though a more appropriate way could be to use maybe Xavier initialization but for our network, a simple approach will do just fine.
In forward propagation, for each layer, we will first calculate a linear output Z. This is
Z = Wx + b for the first layer. Now, if this looks familiar, that’s because this is the exact same model we used for linear regression. Except here it’s only the linear output which will still go through activations and then the same process will happen for all the layers.
Let’s say your loss function is called L. The forward prop, that we did in the previous chapter, gives you the linear outputs for the 3 classes - now these aren’t our predictions just yet because we haven’t applied a softmax activation to them, but these are the linear outputs for the 3 classes which the softmax gives us probability scores for.
If we were to calculate cost, it would be sum of all the losses across all the examples used in the forward prop to get our prediction.
The Train Method
Let’s get started with writing the train method and this is the function on our model which will run a training loop and update our parameters as the model learns to fit to the given dataset. In addition to passing the training set, we will also pass the number of epochs and the batch size. The training loop is within the Session context. We can run operations within this context using the session.
The Iris Dataset
We will need to normalize our data by computing mean and standard deviation and then normalizing the values for all the features. We will also one hot encode the labels for the three classes.
Training the Model
We have a NN model class, we just need to instantiate it and then call the train method to start the training process. We need to pass the training set, the number of epochs and the batch size to our train method. At this point, it feels a bit similar to what we’d do in a high level API like Keras. Anyway, so let this training continue and it might take a couple of minutes and for every epoch the model trains for, it will display the cost it got for that epoch.
Evaluating the Performance
While looking at in training costs definitely helps us figure our if our model is doing well, we have no idea if the trained model is actually going to perform well on new data. To evaluate the performance of our model, let’s write a method in our NNModel class to calculate accuracy on a batch of data. We will make it so we can run this within the session context as the training loop goes on.
About the Host (Amit)
I am a Software Engineer with many years of experience in writing commercial software. My current areas of interest include computer vision and sequence modelling for automated signal processing using deep learning as well as developing chatbots.