We will cover the following tasks in 47 minutes:
ROCAUC curves are a good way to see overall how well your classifier is generalizing or performing. The Receiver Operating Characteristic (ROC) is a measure of a classifier’s predictive quality that compares and visualizes the tradeoff between the model’s sensitivity and specificity.
We will also touch on another important topic that is often glossed over. Data snooping can significantly harm out of sample performance by contaminating our analysis. We will learn to recognize, account for, and avoid data snooping.
Classification Report and Confusion Matrix
We have a few good options for visual classification scoring. In the last task, we explored the classic ROCAUC. That is a good way to see, overall, how good our classifier is performing. But we can also use
Yellowbrick to start diagnosing some of the problems. So, in this task, we’re going to create a visual classification report heatmap and a visual confusion matrix. They are helpful not just with overall accuracy with the f1 score, but to also start getting a sense for places where our model is performing better or worse.
Real-world data is often distributed somewhat unevenly, meaning that the fitted model is likely to perform better on some sections of the data than on others.
In this task, we will first visualize our model’s performance on different train/test splits. We will use Yellowbrick’s
CVScores visualizer to visually explore these variations in performance using different cross validation strategies.
Next, we use a
StratifiedKFold cross-validation strategy to ensure all of our classes in each split are represented with the same proportion.
Evaluating Class Balance
A very common question during model evaluation is, “Why isn’t the model I’ve picked predictive?”. After completing this task, you will have a good answer to this question. The idea centers on the imbalance between classes within your data. We will also learn best practices to accommodate for such imbalances such that they do not adversely affect model performance.
Discrimination Threshold for Logistic Regression
Our Logistic Regression model is a binary classifier. We can use a discrimination threshold plot to evaluate how well our classifier is performing on metrics such as f1-scores, recall, precision, and queue rates.
More generally, based on your application or business needs, you can use this visualization to quickly hone in on where you want to set that threshold.
About the Host (Snehan Kekre)
Snehan hosts Machine Learning courses at Rhyme. He is in his senior year of university at the Minerva Schools at KGI, pursuing a double major in the Natural Sciences and Computational Sciences, with a focus on physics and machine learning. When not applying computational and quantitative methods to identify the structures shaping the world around him, he can sometimes be seen trekking in the mountains of Nepal.