Project: Perform Feature Analysis with Yellowbrick

This is the first project in the Machine Learning Visualization module. We are going to use visualizations to steer our machine learning workflow. The problem we will tackle is to predict whether rooms in apartments are occupied or unoccupied based on passive sensor data such as Temperature, Humidity, Light and CO2 levels.

Through every step of the model selection triple, namely, Feature Engineering, Algorithm Selection, and Hyperparamter Tuning, we will make data-informed decisions augmented by visualizations.

The idea is that while scikit-learn includes a rich selection of model diagnostic and selection tools, model evaluation is often aided by the generation of visualizations, particularly when there are a large number of features involved. So this project will introduce Yellowbrick, which extends the scikit-learn API with visual analysis and diagnostic tools. The Yellowbrick API also wraps matplotlib to generate figures and interactive data explorations while still allowing developers fine-grain control of figures.

This way we can evaluate the performance, stability, and predictive value of machine learning models and assist in diagnosing problems throughout the machine learning workflow.

Join for Free
Project: Perform Feature Analysis with Yellowbrick

Duration (mins)


NA / 5


Task List

We will cover the following tasks in 1 hour and 15 minutes:

Introduction and Importing Libraries

We will understand the Rhyme interface and our learning environment. You will be provided with a cloud desktop with Jupyter Notebooks and all the software you will need to complete the project. Jupyter Notebooks are very popular with Data Science and Machine Learning Engineers as one can write code in cells and use other cells for documentation.

Lastly, we clearly define the steps of a general machine problem and then import libraries and helper functions that will be essential later in the project.

Anscombe's Quartet

To understand why visual diagnostics are vital to machine learning, we compute the summary statistics of four datasets and plot them. The surprising result we observe is that while the means, standard deviations, and correlation coefficients are identical across all of them, they appear drastically different when plotted.

This illustrative example was first conceived in 1973 by the English statistician Francis Anscombe. He wanted to dispel the ever pervasive notion that “numerical calculations are exact, but graphs are rough”.

Feature Analysis: Loading the Classification Data

Feature Analysis can be generalized to the following three steps:

  1. Define a bounded, high dimensional feature space that can be effectively modeled.
  2. Transform and manipulate the space to make modeling easier.
  3. Extract a feature representation of each instance in the space.

Our goal in this task will be to load the room occupancy data, specify the features of interest, and to extract the instances and target.

Feature Analysis: Scatter Plot

In data science and machine learning we can use scatter plots to quickly graph data during analysis. Oftentimes, they are used as an informative base for more complex and higher dimensional visualizations.

In this task, we are going to simply plot instances of two features against each other to assess the relationship between the pair. Can we learn something novel that we would have otherwise missed? Let’s find out!

Feature Analysis: Radviz

Another very important feature visualization algorithm is RadViz. Machine learning engineers and data scientists often use radial visualizations in their workflow to ascertain class separability and feature importance.

In this task, we will use RadViz to plot our features on the unit circle, drop our instances as points within this circle, and let the features pull on the points according to their normalized values.

Feature Analysis: Parallel Coordinates Plot

Like RadViz, parallel coordinate plots visualize multi-dimensional features. We will use parallel coordinates to get a much better sense of the distribution of the features and if any features are highly variable with respect to any one class in the room occupancy dataset.

Feature Analysis: Rank Features

Are the features predictive? What is smallest set of features I can feed into my model to maximize for predictive performance?

These questions are bound to come up in any machine learning problem. In this task, we will use Rank2D to score and visualize pairs of features according to various metrics so that we can make a well-informed qualitative and quantitative decisions about which features to include and why.

Feature Analysis: Manifold Visualization

Watch Preview

Preview the instructions that you will follow along in a hands-on session in your browser.

Snehan Kekre

About the Host (Snehan Kekre)

Snehan Kekre is a Machine Learning and Data Science Instructor at Coursera. He studied Computer Science and Artificial Intelligence at Minerva Schools at KGI, based in San Francisco. His interests include AI safety, EdTech, and instructional design. He recognizes that building a deep, technical understanding of machine learning and AI among students and engineers is necessary in order to grow the AI safety community. This passion drives him to design hands-on, project-based machine learning courses on Rhyme.

Frequently Asked Questions

In Rhyme, all projects are completely hands-on. You don't just passively watch someone else. You use the software directly while following the host's (Snehan Kekre) instructions. Using the software is the only way to achieve mastery. With the "Live Guide" option, you can ask for help and get immediate response.
Nothing! Just join through your web browser. Your host (Snehan Kekre) has already installed all required software and configured all data.
Absolutely! Your host (Snehan Kekre) has provided this session completely free of cost!
You can go to, sign up for free, and follow this visual guide How to use Rhyme to create your own projects. If you have custom needs or company-specific environment, please email us at
Absolutely. We offer Rhyme for workgroups as well larger departments and companies. Universities, academies, and bootcamps can also buy Rhyme for their settings. You can select projects and trainings that are mission critical for you and, as well, author your own that reflect your own needs and tech environments. Please email us at
Rhyme strives to ensure that visual instructions are helpful for reading impairments. The Rhyme interface has features like resolution and zoom that will be helpful for visual impairments. And, we are currently developing a close-caption functionality to help with hearing impairments. Most of the accessibility options of the cloud desktop's operating system or the specific application can also be used in Rhyme. If you have questions related to accessibility, please email us at
We started with windows and linux cloud desktops because they have the most flexibility in teaching any software (desktop or web). However, web applications like Salesforce can run directly through a virtual browser. And, others like Jupyter and RStudio can run on containers and be accessed by virtual browsers. We are currently working on such features where such web applications won't need to run through cloud desktops. But, the rest of the Rhyme learning, authoring, and monitoring interfaces will remain the same.
Please email us at and we'll respond to you within one business day.

Ready to join this 1 hour and 15 minutes session for free?

More Projects by Snehan Kekre