From theory to practice with the Otsu thresholding algorithm
Let me start with a very technical concept:
An Image will be viewed, treated, analyzed, and processed as a 2D signal.
And some proper definitions:
- A signal is a quantity that changes over space or time and can be used to transmit a form of information.
- An image is nothing but a quantity of light that hits an optical system, that is the camera or the canvas where you are painting it.
In this sense, an image is nothing but a 2D signal, an electromagnetic signal that carries some information that is retrieved by a physical system.
So as we have established that an image is indeed a signal, we can think of applying a signal processing technique to an image processing task. We can thus stop discussing philosophy and start with the nerd part.
Speaking of philosophy. Let’s take this image:
There is the philosopher in the image doing his job: thinking. And then there is this very white background, that we really don’t care about. Can we get rid of it? Can we get something like that?
If I’m asking you, it means that we can. 😅
Every person who knows a little bit of photoshop can do that, but how can you do that automatically and with Python? Again, yes.
Let me show you how 🚀
So let’s take a straightforward case.
Yep. A small square inside a bigger square. This is an extremely simple case. What we want to do is to set all the values in the smaller square to 1 and everything that is outside to 0.
We can extract the two values with this code:
And then do something like:
This converts the image from two values to 1 and 0.
This is extremely simple, right? Let’s make it a little bit harder.
Now we will do the little square inside the bigger square but both the squares have some noise.
What I mean is that we don’t have only 2 values but we can theoretically have all the values between 0 and 255 which is the whole range of values in the encoding.
How do we deal with this?
Well, the first thing we want to do is to flatten the image (2D signal) and change it into a 1D one.
The image was a 50×50 image and we have a “raveled” 50×50=2500 long 1D signal.
Now if we study the distribution of our 1D Signal we got something like this:
As we can see, we have two normal distributions. This is exactly where the Otsu algorithm performs best. The underlying idea is that the background and the subject of the image have two different natures and two different domains. For example, in this case, the first gaussian bell is the one related to the background (let’s say from 0 to 50), while the second Gaussian bell is the one of the smaller square (from 150 to 250).
So let’s say that we decide that we set everything that is larger than 100 as 1 and everything that is smaller as 0:
And the result is the following mask between the background and the subject:
This is it. This is the whole idea of the Otsu algorithm:
- Import/Read the image as a 2D signal
- Flatten the image into a 1D vector
- Choose a threshold
- Set everything that is below that threshold as 0 and everything that is above as 1
Very easy right?
But how do we choose the proper threshold? What is the best one? Let’s talk about math.
Let’s formalize this concept a little bit.
We have a domain of pixels in an image. The full domain goes from 0 to 255 (white to black) but it doesn’t have to be that wide (it can be from 20 to 200 for example).
Now, multiple points can have the same pixel intensity of course (we can have two black pixels in the same image). Let’s say that we have 3 pixels with an intensity of 255 in an image that has 100 pixels. Now the probability of having intensity 255 in that image is 3/100.
In general, we can say that the probability of having pixel i in an image is:
Now let’s say that the pixel on which we are doing the split is pixel k (in our previous example k was 100). This classifies the data points. All the points before k belong to class 0 and all the points after k belong to class 1.
This means that the probability of picking a point from class 0 is the following:
While the probability of picking a point from class 1 is the following:
As we can see, both probabilities are obviously dependent on k.
Now, another thing that we can compute is the variance for each class:
Where:
And
The sigma value is the variance of each class aka how much the class is spread around the mean values that are mu_0 and mu_1.
Now theoretically the idea is to find the value that creates that little valley that we saw in this picture earlier:
But the approach that we use is slightly different and more rigorous. By using the same idea of the Linear discriminant analysis (LDA). In the (Fisher) LDA we want to find a hyperplane that splits the two distributions in a way that the variance between classes is as big as possible (so that the two means are the furthest away from each other) and the variance within the classes is as small as possible (so that we don’t have too much overlap between the two classes data points).
In this case, we don’t have any hyperplane and the threshold that we set (our k) is not even a line, but it is more of a probability value that we use to discriminate data points and classify them.
It can be proven (full proof here in the original paper) that the best split between the background and subject (is given the assumption that the domain of the background is different from the domain of the subject) is obtained by minimizing this quantity:
This means that we can try all different ks and just pick the one with the lowest k.
The theory might look complex and difficult to understand, but the implementation is extremely easy and it is made of three blocks:
2.1 Importing the libraries
The first thing that we want to do is to import 4 basic libraries that we will need.
2.2 The threshold function
Once you find the perfect threshold, this is how to apply it to your image:
2.3 The Otsu criterion
The function that will compute this quantity:
Is the following:
2.4 Best threshold computation
This other function just runs all over the possible ks and finds the best one according to the criterion above:
2.5 The whole process
So the image we are using is the following one:
If we save that image in a path and we apply the Otsu algorithm we get:
And if we compare im (the original image) and im_otsu (the one after the algorithm) we get:
As we can see, the black part of the right upper part of the picture is misinterpreted as the subject because it has the same tone as some of the subjects. People are not perfect and neither are the algorithms 🙃
Thank you for being here with me throughout the whole path of this Otsu algorithm tutorial.
In this brief article, we saw:
- That an Image can be treated as a 2D signal and can then be analyzed using Signal Processing technique
- The assumption of the Otsu algorithm, that is the background and the subject of an image have two, continuous, non-overlapping, distinguished domains
- How to find the best discrimination between the background and subject of an image given the Otsu algorithm. How we can interpret the Otsu algorithm as a Fisher Linear Discriminant.
- How to implement the Otsu algorithm using Python
- How to apply this algorithm in a real image
If you liked the article and you want to know more about machine learning, or you just want to ask me something, you can:
A. Follow me on Linkedin, where I publish all my stories
B. Subscribe to my newsletter. It will keep you updated about new stories and give you the chance to text me to receive all the corrections or doubts you may have.
C. Become a referred member, so you won’t have any “maximum number of stories for the month” and you can read whatever I (and thousands of other Machine Learning and Data Science top writers) write about the newest technology available.