5.0 / 5
We will cover the following tasks in 1 hour and 40 minutes:
In this task, you will get a sense of your virtual machine, the Jupyter Notebook software, and the course material.
Beyond Linear Discriminative Classifiers
Understand the scope of scope and limitations of linear classifiers, and how SVMs offer a way to overcome them. We will use Scikit-Learn to generate a random dataset with two linearly separable classes. Next we will try finding the best decision boundary for our data.
Many Possible Separators
In this chapter, we will plot multiple decision boundaries that give us perfect in-sample classification. We learn why these linear models lead to poor generalization performance and how SVMs provide a way to overcome them.
Plotting the Margins
In this chapter, we will plot margins around our three decision boundaries. We will also look at some of the mathematics and frame our constrained optimization problem in the language of quadratic programming.
Training an SVM Model
In this chapter, we will train an SVM model with Scikit-Learn’s support vector classifier (SVC) and fit the model to our data.
Facial Recognition with SVMs
As our first application of SVMs to real-world tasks, we begin to tackle the domain of facial recognition. In this task, we load the Labeled Faces data from the Wild dataset. Built into Scikit-Learn, this dataset consists of thousands of labeled photographs of various public figures.
Exploring the data set
We plot some of the faces from our data to get a sense of what we are working with.
Preprocessing the data set
In the previous task we observed that each image consists of nearly 3000 pixel values. Instead of simply using each one as a feature, we use Principal Component Analysis to extract more meaningful features which we will feed to our Support Vector Classifier.
After preprocessing our data, we split them into training and test sets.
Grid-Search Cross Validation
In this task, we will determine the best model. To do so, we will use grid search cross-validation to determine the optimal parameters from all the possible combinations.
Visualize Test Images
With our cross-validated model, we plot a few of the test images with their predicted labels. Remember, our model predicts labels for data that it hasn’t encountered previously during the training process.
Evaluating the Support Vector Classifier
We evaluate our classifier’s out-of-sample performance and get the recovery statistics using the classification report. We also visualize the confusion matrix to understand which labels might be confusing our classifier.
About the Host (Snehan Kekre)
Snehan hosts Machine Learning and Data Sciences projects at Rhyme. He is in his senior year of university at the Minerva Schools at KGI, studying Computer Science and Artificial Intelligence. When not applying computational and quantitative methods to identify the structures shaping the world around him, he can sometimes be seen trekking in the mountains of Nepal.