[OLD] Amazon SageMaker: Semantic Segmentation

In this project, we are going to perform a common computer vision task called semantic segmentation. This is the task of classifying every pixel in an image with a class from a known set of labels or classes. The semantic segmentation output is usually represented as different pixel values in the image. Therefore, the output is an integer matrix with the same shape as the input image. This output image is called a segmentation mask.

Available On Coursera
[OLD] Amazon SageMaker: Semantic Segmentation

Task List

We will cover the following tasks in 1 hour and 3 minutes:

Introduction and Setup

Hi and welcome to this project on Semantic Segmentation using Amazon Sagemaker. Sagemaker is a set of managed services by Amazon which allow machine learning developers to create datasets, create and train models, tune and deploy models very easily. Sagemaker comes with a bunch of algorithms and pre-trained models for most common machine learning tasks but you can easily create your own custom architectures and algorithms as well. In this task, we will load the relevant libraries and helper functions and we will also download the data that we’d be working with.

Sagemaker Setup

In order to perform training, we will need to setup and authenticate the use of AWS services. We will need an execution role, a sagemaker session and we will also need a s3 bucket to store our dataset in as well as to store the final trained model in. The session object manages interactions with Amazon SageMaker APIs and any other AWS service that the training job uses.

We are going to use the default Sagemaker bucket. Sagemaker uses Docker to contain various images for various tasks. We are interested in semantic segmentation so we will need to access that docker image as well.

Data Preparation

We are using the PASCAL VOC dataset from VOC 2012. This is a very popular dataset used for a variety of computer vision tasks including semantic segmentation. Let’s move the data to s3 in the structure that the algorithm requires. Sagemaker algorithms read data from s3 buckets so this is important. We need to move the training images to train directory, validation images to validation and so on. Fortunately, the dataset’s annotations are already named in sync with the image names, satisfying that requirement of the Amazon SageMaker Semantic Segmentation algorithm. Let’s first create a directory structure mimicking the s3 bucket where data is to be dumped. Once that is done, we can simply copy the directories to s3.

Upload to S3

Let us now move our prepared datset to the S3 bucket that we decided to use in a project earlier. Let’s now upload our dataset with KEY PREFIX: S3 uses the prefix to create a directory structure for the bucket content that it display in the S3 console. Once the model is trained, we will need to access for deployment - so we will dump this model artifact in s3 as well. Let’s set a location where it will be dumped. We will use the same s3 bucket.

Create Model

Now that we have uploaded the data to s3, we are ready to train our semantic segmentation algorithm. First, we will create an estimator. This estimator will handle the end-to-end Amazon SageMaker training and deployment tasks. We will need a fast GPU instance for training a task like this. We are going to use a ml.p3.2xlarge instance to train.


The semantic segmentation algorithm at its core has an encoder network and a decoder network. The encoder is usually a regular convolutional neural network typically pre-trained on imagenet or other popular classification datasets.

The decoder is a network that picks up the outputs of one or many layers from the backbone and reconstructs the segmentation mask from it. There have been some very cool papers which use different ideas for decoding the encoder output and the most popular ones right now are FCN and DeepLab. Deeplab version 3, at the time of creating this project, is the state of the art approach to semantic segmentation.

Input Objects and Model Training

Now that the hyperparameters are setup, let us prepare the handshake between our data channels and the algorithm. To do this, we need to create the S3 INPUT objects within Sagemaker from our data channels. These objects are then put in a dictionary, which the algorithm uses to train. Training the algorithm involves a few steps. Firstly, the instances that we requested while creating the Estimator classes are provisioned and are setup with the appropriate libraries. Then, the data from our channels are downloaded into the instance. Once this is done, the training job begins. The provisioning and data downloading will take some time.

Model Deployment

However, we just have a dump of the model on s3, we can’t yet use it for any inference. To do that, let’s deploy our model in an EC2 instance. For inference, we don’t need a GPU so we will use a ml.c5.xlarge instance as an example.


We are almost at the end of our project. Now that the model is deployed, let’s download a random image from the internet and see how the pixels are classified or segmented by our trained model. Having an endpoint running will incur some costs. Therefore as a clean-up job, we should delete the endpoint after we are done with inference. Please go to your AWS console and if you want, delete the s3 buckets, the sagemaker notebook, etc. if you need to. The training job, once completed, will not incur any further costs as the cleaning up for that is already taken care of by Sagemaker.

Watch Preview

Preview the instructions that you will follow along in a hands-on session in your browser.


It would be great to allow copy+paste. I couldn't do it from my Mac laptop :(

Alan Hickey
Alan Hickey
Amit Yadav

About the Host (Amit Yadav)

I am a machine learning engineer with focus in computer vision and sequence modelling for automated signal processing using deep learning techniques. My previous experiences include leading chatbot development for a large corporation.

Frequently Asked Questions

In Rhyme, all projects are completely hands-on. You don't just passively watch someone else. You use the software directly while following the host's (Amit Yadav) instructions. Using the software is the only way to achieve mastery. With the "Live Guide" option, you can ask for help and get immediate response.
Nothing! Just join through your web browser. Your host (Amit Yadav) has already installed all required software and configured all data.
You can go to https://rhyme.com, sign up for free, and follow this visual guide How to use Rhyme to create your own projects. If you have custom needs or company-specific environment, please email us at help@rhyme.com
Absolutely. We offer Rhyme for workgroups as well larger departments and companies. Universities, academies, and bootcamps can also buy Rhyme for their settings. You can select projects and trainings that are mission critical for you and, as well, author your own that reflect your own needs and tech environments. Please email us at help@rhyme.com
Rhyme strives to ensure that visual instructions are helpful for reading impairments. The Rhyme interface has features like resolution and zoom that will be helpful for visual impairments. And, we are currently developing a close-caption functionality to help with hearing impairments. Most of the accessibility options of the cloud desktop's operating system or the specific application can also be used in Rhyme. If you have questions related to accessibility, please email us at accessibility@rhyme.com
We started with windows and linux cloud desktops because they have the most flexibility in teaching any software (desktop or web). However, web applications like Salesforce can run directly through a virtual browser. And, others like Jupyter and RStudio can run on containers and be accessed by virtual browsers. We are currently working on such features where such web applications won't need to run through cloud desktops. But, the rest of the Rhyme learning, authoring, and monitoring interfaces will remain the same.
Please email us at help@rhyme.com and we'll respond to you within one business day.

No sessions available

More Projects by Amit Yadav

Your First Python Program
Your First Python Program
1 hour and 26 minutes
Linear Regression with Python
Linear Regression with Python
1 hour and 4 minutes