One of the areas where machine learning methods are used the most is Computer Vision. For computer vision, we can say that image processing and artificial intelligence come together. In today’s post, we will look at how we can implement hierarchical clustering, a machine learning method, on images with Python. Machine learning methods can be categorized such as:
- Supervised learning
- Unsupervised learning
- Semi-supervised learning
Hierarchical clustering is an unsupervised learning method. Unsupervised learning methods, discovers hidden patterns or data groupings by detecting similarities and differences in information without the need for human intervention.
Here are the python libraries I used:
- scikit-image, for image processing,
- scipy, technical calculations,
- matplotlib, for visualization
First of all, I start by choosing the input images. For an easy validation of the results, I selected eight images; four of them are human faces and four of them are vehicles. So I’ll be expecting the program to separate the faces from the vehicles.
Let’s read and display the images:
Now let’s overview the steps of machine learning/image processing:
Our first step will be feature extraction in order to obtain computable information from the images. Features are parts or patterns of an object in an image that help define it. For example, a square has 4 edges and 4 corners, these can be called features of the square and help us determine that it is a square. Corners, edges, colors, textures, etc. are the features of the images. The features we get from the image can be expressed as a vector, these are called feature vectors/descriptors. I used HOG (Histogram of Oriented Gradients) as feature vector.
HOG is a widely used feature descriptor in object identification because it works fast and gives precise results. In another article, I will cover HOG and how it is calculated in more detail. Let’s take a look at the code and output for now.
fd, hog_image = hog(images[i], orientations=9, pixels_per_cell=(64, 64), cells_per_block=(2, 2), visualize=True, multichannel=True)
HOG function will take the original image as input. The image will be divided into blocks of equal size and the blocks will be divided into cells of equal size. Then a histogram will be generated for each cell. This histogram will contain information about how all the gradients in the cell are oriented according to the number of bins that we specified in the range of 0° — 180°. These histograms from all the cells will be added together which will become a long, one-dimensional vector. This vector we get will be the ‘HOG feature vector’ of our image.
The parameters of the HOG function:
- input: Input image.
- orientations: Number of orientation bins.
- cells_per_block: Number of cells in each block.
- pixels_per_cell: Number of cells in each block.
- visualize: Also return an image of the HOG. For each cell and orientation bin, the image contains a line segment that is centered at the cell center, is perpendicular to the midpoint of the range of angles spanned by the orientation bin, and has intensity proportional to the corresponding histogram value.
We extracted our feature vectors from all images and saved them in a list. Now we need to compare these vectors with each other and calculate their similarity, in other words, their distances. For this, we can get help from distance functions. I used the Jensen–Shannon divergence, which is used in statistics and probability, which measures the similarity between two probability distributions. (Actually, the feature vectors we get with HOG are density histograms, and we can think of them as a probability distribution.)
Before applying Jensen–Shannon, let’s talk about the distance matrix. The distance matrix is a two-dimensional square matrix containing the distances between the pair of items of a set. Since we have 8 images, our distance matrix will be 8×8.
The distance matrix will be symmetrical (because the distance between x and y is the same as the distance between y and x) and all values on the diagonal will be zero (because each element is zero distance from itself). Now, let’s calculate the distance of all feature vectors from each other with a nested loop and create a distance matrix.
We need to convert our 2D distance matrix into a condensed distance matrix which will be the flat array containing the upper triangular of the distance matrix.
array([0.21668534, 0.2053017 , 0.29673321, 0.11714878, 0.19228613, 0.18577299, 0.30157715, 0.25553919, 0.18902859, 0.22001129, 0.12968395, 0.24286811, 0.17783984, 0.34236163, 0.16659271, 0.24649133, 0.08186722, 0.33763304, 0.30279613, 0.22517062, 0.31988354, 0.17019618, 0.19191148, 0.14927076, 0.31512683, 0.23657825, 0.19442411, 0.31916923])
Hierarchical clustering is a method of creating a hierarchy of clusters. In general, there are two approaches:
- Agglomerative: Each item starts in its own cluster, the two nearest items are clustered. As you move up the hierarchy, pairs of clusters are merged. This continues until all elements form a single set. This is a bottom-up approach.
- Divider: The entire item set starts out as a single set. As you move down the hierarchy, clusters split into more homogeneous clusters, and so on until only one element remains in each cluster. This is a top-down approach.
I chose to implement agglomerative approach. We will use the linkage function from the scipy library.
Z = linkage(cond_distance_matrix, method='ward')
The first paramter is the condensed distance matrix that we obtained in the previous section. With the second parameter, we will determine the linkage method. Let’s take a look at the linkage methods:
- Single Linkage
The distance between two clusters is the minimum distance between an observation in one cluster and an observation in the other cluster.
- Complete Linkage
The distance between two clusters is the maximum distance between an observation in one cluster and an observation in the other cluster.
- Average Linkage
The distance between two clusters is the average distance between an observation in one cluster and an observation in the other cluster.İki küme arasındaki ortalama mesafeyi hesaplar.
- Centroid Linkage
The distance between two clusters is the distance between the cluster centroids or means.
- Ward’s Linkage
With Ward’s linkage method, the distance between two clusters is the sum of squared deviations from points to centroids. The objective of Ward’s linkage is to minimize the within-cluster sum of squares. The Ward method is less sensitive to noise and outliers.
I got the best results with the Ward method. You can also test different linkage methods with different images.
When the Linkage function is called, it will return hierarchical sets encoded as a linkage matrix. We can visualize the relationship between all clusters formed by drawing a dendrogram. Dendrograms make it easy to interpret the results and determine the number of clusters.
We see the image IDs on the X-axis, and the distance between the clusters on the Y-axis. When we look at the hierarchy from the top, we can see that it done well separating the vehicles and faces.
One of the advantages of agglomerative hierarchical clustering is not having to predetermine the number of clusters. We can choose the number of clusters by looking at the dendograms, as it provides a hierarchical relationship between all clusters. For example, when we cut the dendogram with the red horizontal line, we will have two clusters, and when we cut the dendogram with the blue horizontal line, we will have four clusters.
In this article, we went over the machine learning steps in computer vision and applied hierarchical clustering, an unsupervised machine learning method, on images with Python. You can access all the code from my github repo and test it with your own images and different parameters. I hope you enjoyed reading and applying it. See you in my next post!