5.0 / 5
We will cover the following tasks in 1 hour and 8 minutes:
In this project, we are going to create and train a neural network to perform a regression task. In a regression task, we train the network to predict a continuous value given a set of input features. For this project, we are using a dataset called Real estate valuation data set. There are a total 414 examples in this dataset. This dataset is taken from the paper Yeh, I. C., & Hsu, T. K. (2018). Building real estate valuation models with comparative approach through case-based reasoning.
We are not using any of their approaches from the paper but we are just using the dataset to get started with solving linear regression problems with the help of neural networks and in that process, hopefully learn a few concepts related to both neural networks and Keras and TensorFlow.
Importing the Data
First of all, we will import the dataset. We will use the popular Pandas library to import and process the data. The
head() function gives us the first 5 rows by default. So, we can see the typical values for our features as given in the dataset. There are 414 examples in the dataset and it may not be possible for us to go through all of them manually. So, we want to ensure that there aren’t any missing values in the dataset before we start using it. Pandas makes it really simple. We can use the
isna() function to find out any missing or not available values in the dataset.
We will get rid of the first column, that only has serialised numbers for the rows and is not a feature. Next, we will use the helper functions available in pandas to find out the mean, the max and the minimum values for various columns. This makes it really simple to normalise the data! If you have done the previous project on basic image classification, you already know how to perform normalisation. However, we used standard deviation in that project whereas here we are using the difference between the maximum and minimum values. Both approaches are perfectly acceptable: the first approach is actually called standardisation and the second approach, the one we are using now, is called normalisation. But you will often see examples using standardisation but calling it normalisation so often these terms are used interchangeably.
Creating Training and Test Sets
In this task, we will split our DataFrame into the features and the labels and then we will split that data into two sets: one for training and another one for testing. In pandas, we have an option to split data by columns by using
iloc for specifying location. Let’s look at our DataFrame’s head that we printed out in the last task after normalisation. Notice only the first 6 columns of the DataFrame are the features, so let’s store them separately as
X. We will use a helper function called
train_test_split from SciKit Learn to do this split. The reason we are using this helper function is because not only this will very conveniently split our data in the two sets that we need, it will also randomly shuffle the input data before splitting it. This random shuffle is a pretty good idea: if the observations were recorded or arranged in any kind of order for any column, for example, then that pattern may influence our neural network into finding correlations that depend on how the data was observed or stored rather than finding actual meaningful correlations which generalise well.
Create the Model
We will use keras’ Sequential model class to create our model. Creating a neural network model with this Sequential class is quite straight forward, you pass in a list of all the layers that you want in your neural network, in the sequence that you want. Keras provides a variety of layers that can be used here. The one we are going to use is called a dense layer. This is a densely connected layer where all the nodes of such a layer are connected to all the nodes of the preceding layer. We are using two hidden layers with 16 nodes each and both will use the ReLU activation function. Relu is short for rectified linear unit and this activation is applied to the weighted sums for each node of the layer that this activation is used with. The output is a single node. Since we are working on a regression problem, we don’t actually need multiple nodes at output. The one output node will give us the normalised price predictions as we train it.
Activations and Network Architecture
One of the details that we sort of just glossed over in our previous explanation of neural networks was the use of activation functions. We figured that neural networks were simply computational graphs with a bunch of linear functions put together. While this is an ok way to get a basic intuition on neural networks especially when it comes to visualising them, the reality is that if they were actually just a bunch of linear functions added together, the resulting functions will also be linear. Well, in that case, why do we need the neural networks to begin with? This is where activation functions come in: to introduce non-linearity to the neural networks.
Training the Model
If, given a training set, we just keep training our model, the model just keeps getting better and better on that training data. However, it may not actually be able to generalise well on the data it has not seen before. This is why, it’s good practice to use a validation set. The training is done using the training set and the training metrics are then calculated after each epoch on both the training set and the validation set. If the difference between a training metric and a validation metric becomes too large, then we have overfit the model to our training data and the model will not actually generalise well on data is has not seen before. Note that this validation set is not a replacement for the test set but is being used in addition to it.
We just trained our model for 500 epochs which was unnecessary. Fortunately, we can take help of Keras’ callbacks. While training, we can specify a list of functions for our model to callback during the training process. We can use these functions to perform tasks like custom logging, or creating checkpoints or to figure out when to stop the training.
One of the pre-made callbacks we have available is called EarlyStopping. Like the name suggests, it is useful when we don’t really know how long to train the model for. So, we just use this EarlyStopping callback and specify the metric it should be paying attention to. In our case, it will be validation loss. If the value for validation loss doesn’t change much for a few epochs, then we know there’s no point in continuing the training and the process can stop.
Let’s also take a look at predictions on the test set and see if they seem to be reasonably accurate. We will use the
predict method on our model and ask the model to make predictions on the test set. Notice that this is completely new data for our model, it wasn’t part of the training set or the validation set. This should, ideally, be a straight line so there are obviously some errors but given the limited number of examples and considering we did not do any feature engineering, this model does reasonably well as the overall pattern is still somewhat like a straight line.
About the Host (Amit Yadav)
I am a machine learning engineer with focus in computer vision and sequence modelling for automated signal processing using deep learning techniques. My previous experiences include leading chatbot development for a large corporation.