scikit-learn: K-Means Clustering In Practice [OLD]

In this machine learning project, we take a look at applying an unsupervised clustering algorithm, k-means, to two different problems.

First, we apply k-means on the MNIST dataset. We will use k-means to try to identify similar digits without using the original label information. This might be similar to a first step in extracting meaning from a new dataset about which you don’t have any prior label information.

In the second half of the project, we will use k-means clustering for color compression within images. Imagine you have an image with millions of colors. In most images, a large number of the colors will be unused, and many of the pixels in the image will have similar or even identical colors. We will reduce these 16 million colors to just 16 colors, using a k-means clustering across the pixel space!

Available On Coursera
scikit-learn: K-Means Clustering In Practice [OLD]

Duration (mins)


5.0 / 5


Task List

We will cover the following tasks in 1 hour and 1 minute:

Loading the Data and Performing K-Means Clustering

We begin by loading the digits from the MNIST dataset and then finding the KMeans clusters. The digits consist of 1,797 samples with 64 features, where each of the 64 features is the brightness of one pixel in an 8×8 image

Plotting the Cluster Centers

In the previous task, we noticed that the cluster centers can be interpreted as a digit within the cluster. Here, we plot the cluster centers to see what the look like. We will find that even without any label information, k-means is able to find clusters whose centers are recognizable digits.

Model Evaluation

The k-means algorithm is blind to the true cluster assignment. So, the class labels from 0-9 can be permuted, resulting in incorrect labeling of the digits.

In the first half of this task, we solve the above issue by matching each learned cluster assignment with the true labels found in them.

Next, we evaluate our model using the accuracy score. This metric tells us how accurate our k-means clustering is in finding similar digits within the data. You’d be surprised to find that running a simple k-means on the data is sufficient to discover almost 80% of the correct grouping of the input.

Interpreting the Confusion Matrix

We plot the confusion matrix of the cluster centers we visualized before. Following from that insight, we observe that our model is confused between 8 and 1.

Even with its limitations, we will have shown that we can build a good digit classifier, using k-means, without using any known class labels!

Loading a Sample Image for Color Compression

With this task, we begin our journey into applying k-means for color compression within images.

We use Scikit-Learn’s datasets module to load a sample image and explore its attributes. Through the rest of this project will work with the same image and compress the original 16 million colors to just 16 colors!

From 16 Million to 16 Colors

In this task, we first normalize the data. We then use k-means across the pixel space to reduce the 16 million colors in our sample image to just 16 colors.

After visualizing these pixels in the color space, and comparing the original to the reduced representation, we find that the result is a recoloring of the original pixels, where each pixel from the sample image is assigned the color of its closest cluster center.

Plotting the Results

In the last task, we visualized the pixels in the color space. Given the abstract nature of color space, let us now plot our result from k-means in the image space. This let’s us compare our sample image of 16 million colors to our compressed image of just 16 colors, achieving a compression factor of around 1 million!

Watch Preview

Preview the instructions that you will follow along in a hands-on session in your browser.

Snehan Kekre

About the Host (Snehan Kekre)

Snehan Kekre is a Machine Learning and Data Science Instructor at Coursera. He studied Computer Science and Artificial Intelligence at Minerva Schools at KGI, based in San Francisco. His interests include AI safety, EdTech, and instructional design. He recognizes that building a deep, technical understanding of machine learning and AI among students and engineers is necessary in order to grow the AI safety community. This passion drives him to design hands-on, project-based machine learning courses on Rhyme.

Frequently Asked Questions

In Rhyme, all projects are completely hands-on. You don't just passively watch someone else. You use the software directly while following the host's (Snehan Kekre) instructions. Using the software is the only way to achieve mastery. With the "Live Guide" option, you can ask for help and get immediate response.
Nothing! Just join through your web browser. Your host (Snehan Kekre) has already installed all required software and configured all data.
You can go to, sign up for free, and follow this visual guide How to use Rhyme to create your own projects. If you have custom needs or company-specific environment, please email us at
Absolutely. We offer Rhyme for workgroups as well larger departments and companies. Universities, academies, and bootcamps can also buy Rhyme for their settings. You can select projects and trainings that are mission critical for you and, as well, author your own that reflect your own needs and tech environments. Please email us at
Rhyme strives to ensure that visual instructions are helpful for reading impairments. The Rhyme interface has features like resolution and zoom that will be helpful for visual impairments. And, we are currently developing a close-caption functionality to help with hearing impairments. Most of the accessibility options of the cloud desktop's operating system or the specific application can also be used in Rhyme. If you have questions related to accessibility, please email us at
We started with windows and linux cloud desktops because they have the most flexibility in teaching any software (desktop or web). However, web applications like Salesforce can run directly through a virtual browser. And, others like Jupyter and RStudio can run on containers and be accessed by virtual browsers. We are currently working on such features where such web applications won't need to run through cloud desktops. But, the rest of the Rhyme learning, authoring, and monitoring interfaces will remain the same.
Please email us at and we'll respond to you within one business day.

No sessions available

More Projects by Snehan Kekre