Machine Learning Visualization: Predicting the Compressive Strength of Concrete

Start for Free
First 2 tasks free. Then, decide to pay $9.99 for the rest
Machine Learning Visualization: Predicting the Compressive Strength of Concrete

Task List


We will cover the following tasks in 1 hour and 10 minutes:


Introduction

We will understand the Rhyme interface and our learning environment. You will be provided with a cloud desktop with Jupyter Notebooks and all the software you will need to complete the project. Jupyter Notebooks are very popular with Data Science and Machine Learning Engineers as one can write code in cells and use other cells for documentation.

We will also introduce the model we will be building as well the dataset for this project.


Data Exploration

In this task, we use the pandas library to load our data file. Next, we explore its attributes as well as the descriptive summary statistics associated with each instance.


Preprocessing the Data

The preprocessing steps will involve specifying the features and target of interest, following by creating the matrix of features and the target vector.


Pairwise Scatterplot

In this task, we continue exploring the basic properties of our data. We leverage the fantastic plot-styling library seaborn to create a pairwise scatterplot of the attributes. We might gain some insight as to what attributes, if any, are less evenly distrusted across our data.


Feature Importances

A question that crops up before any of the machine learning begins is: How do I select the right features?

In this task, we answer just that. A common approach to eliminating features is to describe their relative importance to a model, then eliminate weak features or combinations of features and re-evalute to see if the model fairs better during cross-validation.


Target Visualization

Often in real-world machine learning problems, we suffer from the curse of dimensionality. There is a problem of acquiring sufficient training data. Other times, there aren’t enough data to train regression models to the precision required. In these cases, we may be able to transform the regression problem into a classification problem by binning the target instances into dummy classes.

How do we select the optimal number of bins and ensure that our data is evenly distributed across them? This is precisely the focus of this task!


Evaluating Lasso Regression

In this task, we first divide the data into training and test splits. Next, we fit the model on the training set and predict on the test set. A prediction error plot shows the actual targets from the dataset against the predicted values generated by our model. This allows us to see how much variance is in the model. Machine learning practitioners can diagnose regression models using this plot by comparing against the 45 degree line, where the prediction exactly matches the model.


Visualizing Test-set Errors

We can visualize of error on both the training and test sets to diagnose heteroscedasticity.

Residuals, in the context of regression models, are the difference between the observed value of the target variable (y) and the predicted value (ŷ), e.g. the error of the prediction. The residuals plot shows the difference between residuals on the vertical axis and the dependent variable on the horizontal axis, allowing you to detect regions within the target that may be susceptible to more or less error.


Cross Validation Scores

We generally determine whether a given model is optimal by looking at it’s F1, precision, recall, and accuracy scores(for classification), or it’s coefficient of determination (R2) and error (for regression). However, real world data is often distributed somewhat unevenly, meaning that the fitted model is likely to perform better on some sections of the data than on others. Yellowbrick’s CVScores visualizer enables us to visually explore these variations in performance using different cross validation strategies.


Learning Curves

A learning curve shows the relationship of the training score vs the cross validated test score for an estimator with a varying number of training samples. It can be used to show how much the estimator benefits from more data, and if our model is more sensitive to error due to variance vs. error due to bias.


Hyperparamter Tuning - Alpha Selection

Regularization is designed to penalize model complexity. Alphas that are too high increase the error due to bias (underfit), while alphas that are too low increase the error due to variance (overfit). So in this task, we are going to learn how to choose an optimal alpha such that the error is minimized in both directions.

Watch Preview

Preview the instructions that you will follow along in a hands-on session in your browser.

Snehan Kekre

About the Host (Snehan Kekre)


Snehan hosts Machine Learning courses at Rhyme. He is in his senior year of university at the Minerva Schools at KGI, pursuing a double major in the Natural Sciences and Computational Sciences, with a focus on physics and machine learning. When not applying computational and quantitative methods to identify the structures shaping the world around him, he can sometimes be seen trekking in the mountains of Nepal.



Frequently Asked Questions


In Rhyme, all projects are completely hands-on. You don't just passively watch someone else. You use the software directly while following the host's (Snehan Kekre) instructions. Using the software is the only way to achieve mastery. With the "Live Guide" option, you can ask for help and get immediate response.
Nothing! Just join through your web browser. Your host (Snehan Kekre) has already installed all required software and configured all data.
You can go to https://rhyme.com/for-companies, sign up for free, and follow this visual guide How to use Rhyme to create your own sessions. If you have custom needs or company-specific environment, please email us at help@rhyme.com
Absolutely. We offer Rhyme for workgroups as well larger departments and companies. Universities, academies, and bootcamps can also buy Rhyme for their settings. You can select sessions and trainings that are mission critical for you and, as well, author your own that reflect your own needs and tech environments. Please email us at help@rhyme.com
Please email us at help@rhyme.com and we'll respond to you within one business day.

First 2 tasks free. Then, decide to pay $9.99 for the rest