In this blog, we will talk about my latest project, Color-quantization.

A color palette is a collection of distant colors in an image. In color-quantization, colors in the palette are reduced. This compresses the image. Such compression is to enable the rendering of an image in devices supporting only a limited number of colors.

k-means clustering

How to find out the color palette?
It can be found using a simple unsupervised machine learning algorithm. Specifically k-means clustering, which is a method of vector quantization. It aims to partition 'n' observations into 'k' clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.

Before and after k-means clustering

In layman terms, k-means clustering makes clusters of similar data points from data space. In this case, data space is all pixel values of the image. That means each cluster will contain similar colors and collection of their cluster representatives will give color-palette.

A Guide to the code

Let's break the problem down. There are four parts for this code:

  • Extract pixel values for a given image
  • Apply K-means clustering in pixel image data and get color-palette
  • Create a small image of the palette
  • Create a compressed image using the palette

For all the thing related to image use python image library, PIL For K-means, use classifier available in scikit.

I am still new to using the libraries. So in order to get started, I did the most sensible thing, started reading the documentation of PIL. Don't worry! I didn't read the whole thing. I started searching for the functions I will need with obvious keywords.

Once I got all pixel values using PIL, I had to do clustering. k-means required two inputs: Number of clusters (size of the palette) and pixel values. Actually it was too easy. Just two lines of code and some time to run. I did not expect this to be that easy.

def quantize(dat, n_colors):
    "Applies KMeans to image-data and returns most used colors"
    model = KMeans(n_clusters=n_colors).fit(dat)
    cntrs = model.cluster_centers_.astype(int)
return cntrs

An example:: In Right: input image, Left: compressed image, Middle: color palette

After the clustering simply remapping all the colors into their cluster representative in the image will give the compressed image. With that program is concluded.