Azure ML Studio: Predict Flight Delays Using Weather Data

In this project, we will use Azure Machine Learning Studio to build a predictive model without writing a single line of code! Specifically, we will predict flight delays using weather data provided by the US Bureau of Transportation Statistics and the National Oceanic and Atmospheric Association (NOAA).

Available Through Coursera
Azure ML Studio: Predict Flight Delays Using Weather Data

Duration (mins)

Learners

NA / 5

Rating

Task List


We will cover the following tasks in 1 hour and 6 minutes:


Introduction and Setup Instructions

Azure Machine Learning Studio is a GUI-based integrated development environment for constructing and operationalizing Machine Learning workflow on Azure. In this task, we will signup for an ML Studio account and avail $200 worth of free credits to create our machine learning experiments!


Importing the Data Sets

We start by signing in to our Azure ML Studio account. Next, we create a blank experiment and use the drag and drop module to import our two data sets:

Lastly, we convert our input data, Flight Delays Data.csv and Weather Dataset.csv, to the internal Dataset format used by Azure ML Studio. This is done using the Convert to Dataset module.


Scrubbing Missing Values

Now that we have imported the data, it’s time to get a sense of its properties so that we can make informed decisions when pre-processing. We use the Summarize Data module to generate and visualize descriptive statistics for the columns in the Flight Delays data.

Dealing with missing data is an essential pre-processing step. There are a number of ways to impute data. We will use the Clean Missing Data module to substitute all missing values in our data with 0.


Eliminating Target Leaks

ArrDel15 is the column containing the labels we are trying to predict. Its values indicate whether the flight was delayed or not. Since the data set is made available to the general public and not specifically created for our machine learning problem, it contains additional data that we don’t want going into our model.

Some of these columns are known as target leaks and need to be removed. To actually use the model in real life, we will not have access to information such as the number of minutes the flight was delayed, since we’re trying to predict that in the first place. So, in this task, we will use the Select Columns in Dataset module to exclude the target leaks from our input.


Conversion to Categorial Features

In this task, we use the Edit Metadata module to convert three columns to categorical feature types.


Preparing Features to be Joined with Weather Data

We observe that CRSDepTime and CRSArrTime, the scheduled departure and arrival times, are in hours and minutes represented by a three or four-digit number. Our strategy is going to be to join this dataset with the Weather dataset. We want to join them by time. Since we don’t have weather information by the minute, we will join by the hour of the day.

To extract the hour from this value, we will use the Apply Math Operation module to divide the columns of interest by 100 in-place. Since we want just the hour, we will use another Apply Math Operation module to round the number down.


Preprocessing the Weather Dataset

For the Weather Data, we will have to do a lot of the same things as we did earlier. We scrub any missing data, convert a few columns to categorical feature types, and extract the hour values adjusted for timezones.


Joining Both Datasets

Once we have completed pre-processing the Flight Delay and Weather data, we can use the Join Data module to join them together. For both datasets, we need to specify the same columns which we want to use to join. Namely, we want to specify the Year, Month, DayofMonth, OriginAirportID, and CRSDepTime at the departure airport.


Training and Evaluating the Model

In this final task, our goal is to train and evaluate a binary logistic regression model. Using the Split Data module, we partition our data into 8020 train/test splits. We train the Two-Class Logistic Regression module on the train split and evaluate its performance on the test split using the Score Model and Evaluate Model modules.

Watch Preview

Preview the instructions that you will follow along in a hands-on session in your browser.

Snehan Kekre

About the Host (Snehan Kekre)


Snehan hosts Machine Learning and Data Sciences projects at Rhyme. He is in his senior year of university at the Minerva Schools at KGI, studying Computer Science and Artificial Intelligence. When not applying computational and quantitative methods to identify the structures shaping the world around him, he can sometimes be seen trekking in the mountains of Nepal.



Frequently Asked Questions


In Rhyme, all projects are completely hands-on. You don't just passively watch someone else. You use the software directly while following the host's (Snehan Kekre) instructions. Using the software is the only way to achieve mastery. With the "Live Guide" option, you can ask for help and get immediate response.
Nothing! Just join through your web browser. Your host (Snehan Kekre) has already installed all required software and configured all data.
You can go to https://rhyme.com, sign up for free, and follow this visual guide How to use Rhyme to create your own projects. If you have custom needs or company-specific environment, please email us at help@rhyme.com
Absolutely. We offer Rhyme for workgroups as well larger departments and companies. Universities, academies, and bootcamps can also buy Rhyme for their settings. You can select projects and trainings that are mission critical for you and, as well, author your own that reflect your own needs and tech environments. Please email us at help@rhyme.com
Rhyme strives to ensure that visual instructions are helpful for reading impairments. The Rhyme interface has features like resolution and zoom that will be helpful for visual impairments. And, we are currently developing a close-caption functionality to help with hearing impairments. Most of the accessibility options of the cloud desktop's operating system or the specific application can also be used in Rhyme. If you have questions related to accessibility, please email us at accessibility@rhyme.com
We started with windows and linux cloud desktops because they have the most flexibility in teaching any software (desktop or web). However, web applications like Salesforce can run directly through a virtual browser. And, others like Jupyter and RStudio can run on containers and be accessed by virtual browsers. We are currently working on such features where such web applications won't need to run through cloud desktops. But, the rest of the Rhyme learning, authoring, and monitoring interfaces will remain the same.
Please email us at help@rhyme.com and we'll respond to you within one business day.

No sessions available