Machine Learning News Hubb
Advertisement Banner
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
Machine Learning News Hubb
No Result
View All Result
Home Machine Learning

Hierarchical Clustering of Images with Python | by Dahi Nemutlu | Sep, 2022

admin by admin
September 8, 2022
in Machine Learning


One of the areas where machine learning methods are used the most is Computer Vision. For computer vision, we can say that image processing and artificial intelligence come together. In today’s post, we will look at how we can implement hierarchical clustering, a machine learning method, on images with Python. Machine learning methods can be categorized such as:

  • Supervised learning
  • Unsupervised learning
  • Semi-supervised learning

Hierarchical clustering is an unsupervised learning method. Unsupervised learning methods, discovers hidden patterns or data groupings by detecting similarities and differences in information without the need for human intervention.

Here are the python libraries I used:

  • scikit-image, for image processing,
  • scipy, technical calculations,
  • matplotlib, for visualization

First of all, I start by choosing the input images. For an easy validation of the results, I selected eight images; four of them are human faces and four of them are vehicles. So I’ll be expecting the program to separate the faces from the vehicles.

Let’s read and display the images:

Input images

Now let’s overview the steps of machine learning/image processing:

Machine learning / Image processing workflow

Our first step will be feature extraction in order to obtain computable information from the images. Features are parts or patterns of an object in an image that help define it. For example, a square has 4 edges and 4 corners, these can be called features of the square and help us determine that it is a square. Corners, edges, colors, textures, etc. are the features of the images. The features we get from the image can be expressed as a vector, these are called feature vectors/descriptors. I used HOG (Histogram of Oriented Gradients) as feature vector.

HOG is a widely used feature descriptor in object identification because it works fast and gives precise results. In another article, I will cover HOG and how it is calculated in more detail. Let’s take a look at the code and output for now.

fd, hog_image = hog(images[i], orientations=9, pixels_per_cell=(64, 64), cells_per_block=(2, 2), visualize=True, multichannel=True)

HOG function will take the original image as input. The image will be divided into blocks of equal size and the blocks will be divided into cells of equal size. Then a histogram will be generated for each cell. This histogram will contain information about how all the gradients in the cell are oriented according to the number of bins that we specified in the range of 0° — 180°. These histograms from all the cells will be added together which will become a long, one-dimensional vector. This vector we get will be the ‘HOG feature vector’ of our image.

The parameters of the HOG function:

  • input: Input image.
  • orientations: Number of orientation bins.
  • cells_per_block: Number of cells in each block.
  • pixels_per_cell: Number of cells in each block.
  • visualize: Also return an image of the HOG. For each cell and orientation bin, the image contains a line segment that is centered at the cell center, is perpendicular to the midpoint of the range of angles spanned by the orientation bin, and has intensity proportional to the corresponding histogram value.
HOG images

We extracted our feature vectors from all images and saved them in a list. Now we need to compare these vectors with each other and calculate their similarity, in other words, their distances. For this, we can get help from distance functions. I used the Jensen–Shannon divergence, which is used in statistics and probability, which measures the similarity between two probability distributions. (Actually, the feature vectors we get with HOG are density histograms, and we can think of them as a probability distribution.)

Before applying Jensen–Shannon, let’s talk about the distance matrix. The distance matrix is ​​a two-dimensional square matrix containing the distances between the pair of items of a set. Since we have 8 images, our distance matrix will be 8×8.

The distance matrix will be symmetrical (because the distance between x and y is the same as the distance between y and x) and all values ​​on the diagonal will be zero (because each element is zero distance from itself). Now, let’s calculate the distance of all feature vectors from each other with a nested loop and create a distance matrix.

Distance matrix

We need to convert our 2D distance matrix into a condensed distance matrix which will be the flat array containing the upper triangular of the distance matrix.

array([0.21668534, 0.2053017 , 0.29673321, 0.11714878, 0.19228613,        0.18577299, 0.30157715, 0.25553919, 0.18902859, 0.22001129,        0.12968395, 0.24286811, 0.17783984, 0.34236163, 0.16659271,        0.24649133, 0.08186722, 0.33763304, 0.30279613, 0.22517062,        0.31988354, 0.17019618, 0.19191148, 0.14927076, 0.31512683,        0.23657825, 0.19442411, 0.31916923])

Hierarchical clustering is a method of creating a hierarchy of clusters. In general, there are two approaches:

  • Agglomerative: Each item starts in its own cluster, the two nearest items are clustered. As you move up the hierarchy, pairs of clusters are merged. This continues until all elements form a single set. This is a bottom-up approach.
  • Divider: The entire item set starts out as a single set. As you move down the hierarchy, clusters split into more homogeneous clusters, and so on until only one element remains in each cluster. This is a top-down approach.

I chose to implement agglomerative approach. We will use the linkage function from the scipy library.

Z = linkage(cond_distance_matrix, method='ward')

The first paramter is the condensed distance matrix that we obtained in the previous section. With the second parameter, we will determine the linkage method. Let’s take a look at the linkage methods:

  • Single Linkage
    The distance between two clusters is the minimum distance between an observation in one cluster and an observation in the other cluster.
  • Complete Linkage
    The distance between two clusters is the maximum distance between an observation in one cluster and an observation in the other cluster.
  • Average Linkage
    The distance between two clusters is the average distance between an observation in one cluster and an observation in the other cluster.İki küme arasındaki ortalama mesafeyi hesaplar.
  • Centroid Linkage
    The distance between two clusters is the distance between the cluster centroids or means.
  • Ward’s Linkage
    With Ward’s linkage method, the distance between two clusters is the sum of squared deviations from points to centroids. The objective of Ward’s linkage is to minimize the within-cluster sum of squares. The Ward method is less sensitive to noise and outliers.

I got the best results with the Ward method. You can also test different linkage methods with different images.

When the Linkage function is called, it will return hierarchical sets encoded as a linkage matrix. We can visualize the relationship between all clusters formed by drawing a dendrogram. Dendrograms make it easy to interpret the results and determine the number of clusters.

Dendrogram

We see the image IDs on the X-axis, and the distance between the clusters on the Y-axis. When we look at the hierarchy from the top, we can see that it done well separating the vehicles and faces.

One of the advantages of agglomerative hierarchical clustering is not having to predetermine the number of clusters. We can choose the number of clusters by looking at the dendograms, as it provides a hierarchical relationship between all clusters. For example, when we cut the dendogram with the red horizontal line, we will have two clusters, and when we cut the dendogram with the blue horizontal line, we will have four clusters.

Choosing the number of clusters in the dendrogram

In this article, we went over the machine learning steps in computer vision and applied hierarchical clustering, an unsupervised machine learning method, on images with Python. You can access all the code from my github repo and test it with your own images and different parameters. I hope you enjoyed reading and applying it. See you in my next post!



Source link

Previous Post

Some of The Most Important SQL Commands

Next Post

Bayesian A/B Testing in R. Analyze social media performance with… | by Hannah Roos | Sep, 2022

Next Post

Bayesian A/B Testing in R. Analyze social media performance with… | by Hannah Roos | Sep, 2022

Understanding the concept of Quaternions part2(Complex Numbers) | by Monodeep Mukherjee | Sep, 2022

Can You Out-Architect Bad Data?. Even with the most well-designed data… | by Barr Moses | Sep, 2022

Related Post

Artificial Intelligence

Exploring TensorFlow Model Prediction Issues | by Adam Brownell | Feb, 2023

by admin
February 2, 2023
Machine Learning

Different Loss Functions used in Regression | by Iqra Bismi | Feb, 2023

by admin
February 2, 2023
Machine Learning

How to organize bills? – 3 ways to track bills

by admin
February 2, 2023
Artificial Intelligence

How to decide between Amazon Rekognition image and video API for video moderation

by admin
February 2, 2023
Artificial Intelligence

The Future of AI: GPT-3 vs GPT-4: A Comparative Analysis | by Mohd Saqib | Jan, 2023

by admin
February 2, 2023
Deep Learning

6 Ways To Streamline Tech Hiring With A Recruitment Automation Platform

by admin
February 2, 2023

© 2023 Machine Learning News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

Newsletter Sign Up.

No Result
View All Result
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

© 2023 JNews - Premium WordPress news & magazine theme by Jegtheme.