Azure ML Studio: Feature Engineering and Custom R Models

In this project, we will build a Random Forest model in Azure ML Studio using the R programming language. We will use the Bike Sharing Dataset. The dataset contains the hourly and daily count of rental bikes between years 2011 and 2012 in Capital bikeshare system with the corresponding weather and seasonal information. Using the information from the dataset, we can build a model to predict the number of bikes rented during certain weather conditions.

We will be leveraging the Execute R Script and Create R Model modules to run R scripts from our experiment while feature engineering and to create an R model using custom resources.

Available Through Coursera
Azure ML Studio: Feature Engineering and Custom R Models

Task List


We will cover the following tasks in 52 minutes:


Introduction and Overview

Azure ML Studio has a fair amount of built-in machine learning algorithms that we can use as modules. But what if we need an algorithm not included by default? Well, Azure can leverage the entire open-source R and Python communities! We can simply use the Create R Model module to use any R machine learning library and the associated algorithms.

We are going to be working with the Bike Sharing Dataset which is a great dataset to explore Azure ML Studio’s Execute R Script and Create R Model modules.

The Bike Sharing dataset has 10,886 observations, each one pertaining to a specific hour from the first 19 days of each month from 2011 to 2012. The dataset consists of 11 columns that record information about bike rentals: date-time, season, working day, weather, temp, “feels like” temp, humidity, wind speed, casual rentals, registered rentals, and total rentals.


Feature Engineering and Preprocessing

There is an untapped wealth of prediction power hidden in the “datetime” column. However, it needs to be converted from its current form. Conveniently, Azure ML has a module for running R scripts, which can take advantage of R’s built-in functionality for extracting features from the date-time data.

We now select an R-Script Module to run our feature engineering script. This module allows us to import our dataset from Azure ML, add new features, and then export our improved dataset. This module has many uses beyond our use in this project, which help with cleaning data and creating graphs.

Our goal is to convert the datatime column of strings into date-time objects in R, so we can take advantage of their built-in functionality. R has two internal implementations of date-times: POSIXlt and POSIXct. We found Azure ML had problems dealing with POSIXlt, so we recommend using POSIXct for any date-time feature engineering.


Removing Outliers

This dataset only has one observation where weather = 4. Since this is a categorical variable, R will result in an error if it ends up in the test data split. This is because R expects the number of levels for each categorical variable to equal the number of levels found in the training data split. Therefore, it must be removed. We we write a custom R-script to remove the outlier.


Creating Training and Test Sets

Before training our model, we must tell Azure ML which variables are categorical. To do this, we use the Metadata Editor. We used the column selector to choose the hour, weekday, month, year, season, weather, holiday, and workingday columns. Then we select “Make categorical” under the “Categorical” dropdown.

Before creating our random forest, we must identify columns that add little-to-no value for predictive modeling. These columns will be dropped. Since we are predicting total count, the registered bike rental and casual bike rental columns must be dropped. Together, these values add up to total count, which would lead to a successful but uninformative model because the values would simply be summed to see the total count. One could train separate models to predict casual and registered bike rentals independently. Azure ML would make it very easy to include these models in our experiment after creating one for total count.

We must now directly tell Azure ML which attribute we want our algorithm to train to predict by casting that attribute as a “label”.


Model Building and Training

Here is where we take advantage of AzureMl’s newest feature: the Create R Model module. Now we can use R’s randomForest library and take advantage of its large number of adjustable parameters directly inside AzureML studio. Then, the model can be deployed in a web service. Previously, R models were nearly impossible to deploy to the web.


Evaluating the Model

Unfortunately, AzureML’s Evaluate Model Module does not support models that use the Create R Model module, yet. We assume this feature will be added in the near future. In the meantime, we can import the results from the scored model (Score Model module) into an Execute R Script module and compute an evaluation using R. We calculate the MSE then export our result back to AzureML as a data frame.

Watch Preview

Preview the instructions that you will follow along in a hands-on session in your browser.

Snehan Kekre

About the Host (Snehan Kekre)


Snehan hosts Machine Learning and Data Sciences projects at Rhyme. He is in his senior year of university at the Minerva Schools at KGI, studying Computer Science and Artificial Intelligence. When not applying computational and quantitative methods to identify the structures shaping the world around him, he can sometimes be seen trekking in the mountains of Nepal.



Frequently Asked Questions


In Rhyme, all projects are completely hands-on. You don't just passively watch someone else. You use the software directly while following the host's (Snehan Kekre) instructions. Using the software is the only way to achieve mastery. With the "Live Guide" option, you can ask for help and get immediate response.
Nothing! Just join through your web browser. Your host (Snehan Kekre) has already installed all required software and configured all data.
You can go to https://rhyme.com, sign up for free, and follow this visual guide How to use Rhyme to create your own projects. If you have custom needs or company-specific environment, please email us at help@rhyme.com
Absolutely. We offer Rhyme for workgroups as well larger departments and companies. Universities, academies, and bootcamps can also buy Rhyme for their settings. You can select projects and trainings that are mission critical for you and, as well, author your own that reflect your own needs and tech environments. Please email us at help@rhyme.com
Rhyme strives to ensure that visual instructions are helpful for reading impairments. The Rhyme interface has features like resolution and zoom that will be helpful for visual impairments. And, we are currently developing a close-caption functionality to help with hearing impairments. Most of the accessibility options of the cloud desktop's operating system or the specific application can also be used in Rhyme. If you have questions related to accessibility, please email us at accessibility@rhyme.com
We started with windows and linux cloud desktops because they have the most flexibility in teaching any software (desktop or web). However, web applications like Salesforce can run directly through a virtual browser. And, others like Jupyter and RStudio can run on containers and be accessed by virtual browsers. We are currently working on such features where such web applications won't need to run through cloud desktops. But, the rest of the Rhyme learning, authoring, and monitoring interfaces will remain the same.
Please email us at help@rhyme.com and we'll respond to you within one business day.

No sessions available