Machine Learning Visualization: Room Occupancy Detection using Indoor Climate Sensor Data (Part 1)

This is the first project in the Machine Learning Visualization module. We are going to use visualizations to steer our machine learning workflow. The problem we will tackle is to predict whether rooms in apartments are occupied or unoccupied based on passive sensor data such as Temperature, Humidity, Light and CO2 levels.

Through every step of the model selection triple, namely, Feature Engineering, Algorithm Selection, and Hyperparamter Tuning, we will make data-informed decisions augmented by visualizations.

The idea is that while scikit-learn includes a rich selection of model diagnostic and selection tools, model evaluation is often aided by the generation of visualizations, particularly when there are a large number of features involved. So this project will introduce Yellowbrick, which extends the scikit-learn API with visual analysis and diagnostic tools. The Yellowbrick API also wraps matplotlib to generate figures and interactive data explorations while still allowing developers fine-grain control of figures.

This way we can evaluate the performance, stability, and predictive value of machine learning models and assist in diagnosing problems throughout the machine learning workflow.

Start for Free
First 2 tasks free. Then, decide to pay $9.99 for the rest
Machine Learning Visualization: Room Occupancy Detection using Indoor Climate Sensor Data (Part 1)

Task List


We will cover the following tasks in 1 hour and 15 minutes:


Introduction and Importing Libraries

We will understand the Rhyme interface and our learning environment. You will be provided with a cloud desktop with Jupyter Notebooks and all the software you will need to complete the project. Jupyter Notebooks are very popular with Data Science and Machine Learning Engineers as one can write code in cells and use other cells for documentation.

Lastly, we clearly define the steps of a general machine problem and then import libraries and helper functions that will be essential later in the project.


Anscombe's Quartet

To understand why visual diagnostics are vital to machine learning, we compute the summary statistics of four datasets and plot them. The surprising result we observe is that while the means, standard deviations, and correlation coefficients are identical across all of them, they appear drastically different when plotted.

This illustrative example was first conceived in 1973 by the English statistician Francis Anscombe. He wanted to dispel the ever pervasive notion that “numerical calculations are exact, but graphs are rough”.


Feature Analysis: Loading the Classification Data

Feature Analysis can be generalized to the following three steps:

  1. Define a bounded, high dimensional feature space that can be effectively modeled.
  2. Transform and manipulate the space to make modeling easier.
  3. Extract a feature representation of each instance in the space.

Our goal in this task will be to load the room occupancy data, specify the features of interest, and to extract the instances and target.


Feature Analysis: Scatter Plot

In data science and machine learning we can use scatter plots to quickly graph data during analysis. Oftentimes, they are used as an informative base for more complex and higher dimensional visualizations.

In this task, we are going to simply plot instances of two features against each other to assess the relationship between the pair. Can we learn something novel that we would have otherwise missed? Let’s find out!


Feature Analysis: Radviz

Another very important feature visualization algorithm is RadViz. Machine learning engineers and data scientists often use radial visualizations in their workflow to ascertain class separability and feature importance.

In this task, we will use RadViz to plot our features on the unit circle, drop our instances as points within this circle, and let the features pull on the points according to their normalized values.


Feature Analysis: Parallel Coordinates Plot

Like RadViz, parallel coordinate plots visualize multi-dimensional features. We will use parallel coordinates to get a much better sense of the distribution of the features and if any features are highly variable with respect to any one class in the room occupancy dataset.


Feature Analysis: Rank Features

Are the features predictive? What is smallest set of features I can feed into my model to maximize for predictive performance?

These questions are bound to come up in any machine learning problem. In this task, we will use Rank2D to score and visualize pairs of features according to various metrics so that we can make a well-informed qualitative and quantitative decisions about which features to include and why.


Feature Analysis: Manifold Visualization

Watch Preview

Preview the instructions that you will follow along in a hands-on session in your browser.

Snehan Kekre

About the Host (Snehan Kekre)


Snehan hosts Machine Learning courses at Rhyme. He is in his senior year of university at the Minerva Schools at KGI, pursuing a double major in the Natural Sciences and Computational Sciences, with a focus on physics and machine learning. When not applying computational and quantitative methods to identify the structures shaping the world around him, he can sometimes be seen trekking in the mountains of Nepal.



Frequently Asked Questions


In Rhyme, all projects are completely hands-on. You don't just passively watch someone else. You use the software directly while following the host's (Snehan Kekre) instructions. Using the software is the only way to achieve mastery. With the "Live Guide" option, you can ask for help and get immediate response.
Nothing! Just join through your web browser. Your host (Snehan Kekre) has already installed all required software and configured all data.
You can go to https://rhyme.com/for-companies, sign up for free, and follow this visual guide How to use Rhyme to create your own projects. If you have custom needs or company-specific environment, please email us at help@rhyme.com
Absolutely. We offer Rhyme for workgroups as well larger departments and companies. Universities, academies, and bootcamps can also buy Rhyme for their settings. You can select projects and trainings that are mission critical for you and, as well, author your own that reflect your own needs and tech environments. Please email us at help@rhyme.com
Please email us at help@rhyme.com and we'll respond to you within one business day.

First 2 tasks free. Then, decide to pay $9.99 for the rest