We will cover the following tasks in 1 hour and 7 minutes:
Introduction to the Project and Dataset
We will understand the Rhyme interface and our learning environment. You will be provided with a cloud desktop with Jupyter Notebooks and all the software you will need to complete the project. Jupyter Notebooks are very popular with Data Science and Machine Learning Engineers as one can write code in cells and use other cells for documentation.
We’ll also learn about the Poker Hand dataset from the UCI Machine Learning Repository. The premise is that given some features of a hand of cards in a poker game, we should be able to predict the type of hand.
Next, we read the data from disk into a
Separate the Data into Features and Targets
In this task, we will first manually label the columns and classes based on the dataset description from the UCI Repository.
Finally, we’ll separate the data into features X and targets y for further analysis.
Evaluating Class Balance
A very common question during model evaluation is, “Why isn’t the model I’ve picked predictive?”. After completing this task, you will have a good answer to this question. The idea centers on the imbalance between classes within your data. We will also learn best practices to accommodate for such imbalances such that they do not adversely affect model performance.
Up-sampling from Minority Classes
As a result of severe class imbalances, we’ll use
Pandas to convert these rare classes into a single class that includes Flush or better.
Training a Random Forests Classifier
Now we’ll partition our poker hand data into training and test splits, so that we evaluate our fitted model on data that it wasn’t trained on. This will allow us to see how well our random forests model is balancing the bias/variance trade-off.
In this short task, we will compute the classification accuracy score of our random forests model on the test data.
ROC Curve and AUC
Now that our model is fitted, we evaluate its performance using some of Yellowbrick’s visualizers for classification. With Yellowbrick’s implementation of ROCAUC we can evaluate a multi-class classifier. Yellowbrick does this by plotting the ROCAUC curve for each class as though it were it’s own binary classifier, all on one plot.
Classification Report Heatmap
The classification report displays the precision, recall, and F1 scores for the model. In order to support easier interpretation and problem detection, Yellowbrick’s implementation of
ClassificationReport integrates numerical scores with a color-coded heatmap.
The classification report shows a representation of the main classification metrics on a per-class basis. This gives a deeper intuition of the classifier behavior over global accuracy which can mask functional weaknesses in one class of a multiclass problem.
Class Prediction Error
The Yellowbrick Class Prediction Error chart shows the support for each class in the fitted classification model displayed as a stacked bar. Each bar is segmented to show the distribution of predicted classes for each class. It is initialized with a fitted model and generates a class prediction error chart on draw. For my part, I find
ClassPredictionError a convenient and easier-to-interpret alternative to the standard confusion matrix.
About the Host (Snehan Kekre)
Snehan hosts Machine Learning courses at Rhyme. He is in his senior year of university at the Minerva Schools at KGI, pursuing a double major in the Natural Sciences and Computational Sciences, with a focus on physics and machine learning. When not applying computational and quantitative methods to identify the structures shaping the world around him, he can sometimes be seen trekking in the mountains of Nepal.