Project: Visual Machine Learning with Yellowbrick

In this project, we’ll explore how to evaluate the performance of a random forest classifier from the scikit-learn library on the Poker Hand dataset using visual diagnostic tools from Scikit-Yellowbrick. With an emphasis on visual steering of our analysis, we will cover the following topics in our machine learning workflow:

  • Feature analysis
  • Feature importance
  • Algorithm selection
  • Model evaluation using regression
  • Cross-validation
  • Hyperparameter tuning

Join for Free
Project: Visual Machine Learning with Yellowbrick

Duration (mins)


NA / 5


Task List

We will cover the following tasks in 48 minutes:

Introduction to the Project and Dataset

We will understand the Rhyme interface and our learning environment. You will be provided with a cloud desktop with Jupyter Notebooks and all the software you will need to complete the project. Jupyter Notebooks are very popular with Data Science and Machine Learning Engineers as one can write code in cells and use other cells for documentation.

We’ll also learn about the Poker Hand dataset from the UCI Machine Learning Repository. The premise is that given some features of a hand of cards in a poker game, we should be able to predict the type of hand.

Next, we read the data from disk into a Pandas dataframe.

Separate the Data into Features and Targets

In this task, we will first manually label the columns and classes based on the dataset description from the UCI Repository.

Finally, we’ll separate the data into features X and targets y for further analysis.

Evaluating Class Balance

A very common question during model evaluation is, “Why isn’t the model I’ve picked predictive?”. After completing this task, you will have a good answer to this question. The idea centers on the imbalance between classes within your data. We will also learn best practices to accommodate for such imbalances such that they do not adversely affect model performance.

Up-sampling from Minority Classes

As a result of severe class imbalances, we’ll use Pandas to convert these rare classes into a single class that includes Flush or better.

Training a Random Forests Classifier

Now we’ll partition our poker hand data into training and test splits, so that we evaluate our fitted model on data that it wasn’t trained on. This will allow us to see how well our random forests model is balancing the bias/variance trade-off.

Classification Accuracy

In this short task, we will compute the classification accuracy score of our random forests model on the test data.

ROC Curve and AUC

Now that our model is fitted, we evaluate its performance using some of Yellowbrick’s visualizers for classification. With Yellowbrick’s implementation of ROCAUC we can evaluate a multi-class classifier. Yellowbrick does this by plotting the ROCAUC curve for each class as though it were it’s own binary classifier, all on one plot.

Classification Report Heatmap

The classification report displays the precision, recall, and F1 scores for the model. In order to support easier interpretation and problem detection, Yellowbrick’s implementation of ClassificationReport integrates numerical scores with a color-coded heatmap.

The classification report shows a representation of the main classification metrics on a per-class basis. This gives a deeper intuition of the classifier behavior over global accuracy which can mask functional weaknesses in one class of a multiclass problem.

Class Prediction Error

The Yellowbrick Class Prediction Error chart shows the support for each class in the fitted classification model displayed as a stacked bar. Each bar is segmented to show the distribution of predicted classes for each class. It is initialized with a fitted model and generates a class prediction error chart on draw. For my part, I find ClassPredictionError a convenient and easier-to-interpret alternative to the standard confusion matrix.

Watch Preview

Preview the instructions that you will follow along in a hands-on session in your browser.

Snehan Kekre

About the Host (Snehan Kekre)

Snehan Kekre is a Machine Learning and Data Science Instructor at Coursera. He studied Computer Science and Artificial Intelligence at Minerva Schools at KGI, based in San Francisco. His interests include AI safety, EdTech, and instructional design. He recognizes that building a deep, technical understanding of machine learning and AI among students and engineers is necessary in order to grow the AI safety community. This passion drives him to design hands-on, project-based machine learning courses on Rhyme.

Frequently Asked Questions

In Rhyme, all projects are completely hands-on. You don't just passively watch someone else. You use the software directly while following the host's (Snehan Kekre) instructions. Using the software is the only way to achieve mastery. With the "Live Guide" option, you can ask for help and get immediate response.
Nothing! Just join through your web browser. Your host (Snehan Kekre) has already installed all required software and configured all data.
Absolutely! Your host (Snehan Kekre) has provided this session completely free of cost!
You can go to, sign up for free, and follow this visual guide How to use Rhyme to create your own projects. If you have custom needs or company-specific environment, please email us at
Absolutely. We offer Rhyme for workgroups as well larger departments and companies. Universities, academies, and bootcamps can also buy Rhyme for their settings. You can select projects and trainings that are mission critical for you and, as well, author your own that reflect your own needs and tech environments. Please email us at
Rhyme strives to ensure that visual instructions are helpful for reading impairments. The Rhyme interface has features like resolution and zoom that will be helpful for visual impairments. And, we are currently developing a close-caption functionality to help with hearing impairments. Most of the accessibility options of the cloud desktop's operating system or the specific application can also be used in Rhyme. If you have questions related to accessibility, please email us at
We started with windows and linux cloud desktops because they have the most flexibility in teaching any software (desktop or web). However, web applications like Salesforce can run directly through a virtual browser. And, others like Jupyter and RStudio can run on containers and be accessed by virtual browsers. We are currently working on such features where such web applications won't need to run through cloud desktops. But, the rest of the Rhyme learning, authoring, and monitoring interfaces will remain the same.
Please email us at and we'll respond to you within one business day.

Ready to join this 48 minutes session for free?

More Projects by Snehan Kekre