There are several popular algorithms for clustering data into groups or clusters. Here are some commonly used clustering algorithms:
- K-means: K-means is one of the most widely used clustering algorithms. It partitions the data into k clusters, where each cluster is represented by its centroid. The algorithm iteratively assigns data points to the nearest centroid and updates the centroid until convergence.
- Hierarchical Clustering: Hierarchical clustering creates a hierarchy of clusters by either bottom-up (agglomerative) or top-down (divisive) approaches. In the agglomerative method, each data point starts as a separate cluster and is successively merged based on a distance metric, creating a dendrogram. Divisive hierarchical clustering starts with all data points in one cluster and splits them recursively based on a distance metric until each point is in its own cluster.
- DBSCAN (Density-Based Spatial Clustering of Applications with Noise): DBSCAN groups together data points that are close to each other in high-density regions and separates regions with low-density. It does not require specifying the number of clusters in advance and can identify clusters of arbitrary shapes.
- Mean Shift: Mean Shift is a non-parametric clustering algorithm that iteratively shifts the data points towards the mode of the underlying probability distribution. The algorithm does not require specifying the number of clusters in advance and can discover clusters of arbitrary shapes.
- Gaussian Mixture Models (GMM): GMM is a probabilistic model that assumes the data points are generated from a mixture of Gaussian distributions. The algorithm estimates the parameters of the Gaussian distributions to identify clusters. It provides a soft assignment of data points to clusters, indicating the probability of each point belonging to each cluster.
- Spectral Clustering: Spectral clustering converts the data points into a lower-dimensional space using techniques like eigenvalue decomposition or graph Laplacian, and then applies clustering algorithms in this reduced space. It can effectively cluster data with complex structures or when the clusters have non-convex shapes.
- Agglomerative Clustering: Agglomerative clustering is a bottom-up hierarchical clustering algorithm. It starts with each data point as a separate cluster and iteratively merges the closest clusters until a stopping criterion is met. The merging is based on a distance metric, such as Euclidean distance or linkage criteria like average linkage or complete linkage.
These are just a few examples of clustering algorithms. The choice of algorithm depends on the specific characteristics of the data and the problem at hand. Each algorithm has its strengths and weaknesses, and it’s often necessary to experiment and evaluate different algorithms to find the most suitable one for a given task.
Here’s an example of how to perform K-means clustering using Python and the scikit-learn library:
from sklearn.cluster import KMeans
import numpy as np# Generate some random data for clustering
X = np.random.rand(100, 2)
# Create a KMeans instance with the desired number of clusters
k = 3
kmeans = KMeans(n_clusters=k)
# Fit the data to the KMeans model
kmeans.fit(X)
# Get the cluster labels for each data point
labels = kmeans.labels_
# Get the coordinates of the cluster centers
centers = kmeans.cluster_centers_
# Print the cluster labels and coordinates
for i in range(k):
cluster_points = X[labels == i]
cluster_center = centers[i]
print(f"Cluster {i+1}:")
print("Points:", cluster_points)
print("Center:", cluster_center)
print()
# Predict the cluster for new data points
new_data = np.array([[0.5, 0.5], [0.8, 0.8]])
predicted_labels = kmeans.predict(new_data)
print("Predicted labels for new data:", predicted_labels)
In this example, we first generate some random data (X
) consisting of 100 data points with 2 features. We then create a KMeans
instance with n_clusters
set to the desired number of clusters (k
). We fit the data to the KMeans
model using the fit
method, which performs the clustering. After clustering, we can access the cluster labels for each data point using the labels_
attribute. We can also obtain the coordinates of the cluster centers using the cluster_centers_
attribute. In the example, we print the cluster labels and coordinates for each cluster. Finally, we demonstrate how to predict the cluster for new data points (new_data
) using the predict
method. Make sure you have scikit-learn installed (pip install scikit-learn
) before running this code. Feel free to adjust the data and parameters according to your needs.
Cluster 1:
Points: [[0.13273145 0.18499521]
[0.25151555 0.07787907]
[0.07823873 0.3387473 ]
[0.4304308 0.15849885]
[0.30819556 0.28111794]
[0.50339404 0.22978769]
[0.62456197 0.05360781]
[0.13299314 0.0163285 ]
[0.44633344 0.12845978]
[0.04815211 0.12350577]
[0.26522644 0.26925387]
[0.03603012 0.30437473]
[0.15450865 0.30379892]
[0.06178939 0.19976378]
[0.19120416 0.13277907]
[0.01464823 0.24895028]
[0.25603617 0.06943164]
[0.40750603 0.21904688]
[0.19768336 0.04629001]
[0.31535669 0.28532985]
[0.1437834 0.02928692]
[0.14112393 0.21510022]
[0.31846731 0.12947883]
[0.10411299 0.08433029]
[0.60027052 0.11645673]
[0.41222993 0.15557321]
[0.12977273 0.08381945]
[0.04549053 0.25000397]
[0.24677037 0.0226466 ]
[0.52209919 0.07754387]
[0.20908267 0.19787891]]
Center: [0.24934644 0.16238922]Cluster 2:
Points: [[0.64838528 0.93917235]
[0.72161925 0.807157 ]
[0.92236791 0.77614017]
[0.69303309 0.49814134]
[0.85115936 0.33196127]
[0.61816324 0.84930608]
[0.60415856 0.25401413]
[0.82918861 0.16002955]
[0.99896326 0.1567008 ]
[0.92308659 0.43239048]
[0.63283967 0.24738073]
[0.83965164 0.72679978]
[0.9754857 0.877682 ]
[0.77592903 0.17920678]
[0.51724023 0.44794043]
[0.84826293 0.54129852]
[0.70281191 0.4992381 ]
[0.7534195 0.17720422]
[0.96933357 0.66491496]
[0.48316205 0.42744743]
[0.76412146 0.27369597]
[0.94566971 0.35689016]
[0.51215703 0.57878618]
[0.72353908 0.47355576]
[0.94212271 0.10718535]
[0.99972241 0.4974825 ]
[0.65132605 0.42279316]
[0.85692349 0.55727891]
[0.69953625 0.19331501]
[0.69615992 0.86727653]
[0.70639599 0.32988971]
[0.48516745 0.4195237 ]
[0.79427116 0.55853447]
[0.77357011 0.91146277]
[0.73882631 0.22834844]
[0.76775844 0.77660969]
[0.97303193 0.69844729]
[0.90747301 0.73130874]
[0.54724087 0.68362252]
[0.5762416 0.63890222]
[0.60486479 0.90721805]]
Center: [0.75547271 0.51722569]
Cluster 3:
Points: [[0.00377962 0.77023477]
[0.42512263 0.88758646]
[0.29384245 0.47284459]
[0.08261668 0.93719948]
[0.0418342 0.74522446]
[0.379175 0.74838909]
[0.09017908 0.56191569]
[0.07302454 0.98714923]
[0.46732732 0.73528971]
[0.05706904 0.83993531]
[0.30489174 0.70279006]
[0.09638304 0.69706697]
[0.54308763 0.84851158]
[0.11130683 0.99262739]
[0.04918726 0.65201387]
[0.36383585 0.79013171]
[0.24088464 0.75703108]
[0.07429227 0.59116532]
[0.08539002 0.7362343 ]
[0.4463195 0.89749195]
[0.33319746 0.52495492]
[0.03000479 0.65221903]
[0.06782424 0.64923628]
[0.07080663 0.47259483]
[0.41146967 0.58270735]
[0.32437959 0.72718117]
[0.48961044 0.65654839]
[0.39172798 0.65729438]]
Center: [0.22673465 0.72405605]
Predicted labels for new data: [1 1]
Here’s an example of how to perform hierarchical clustering using Python and the scipy library:
import numpy as np
from scipy.cluster.hierarchy import linkage, dendrogram
import matplotlib.pyplot as plt# Generate some random data for clustering
X = np.random.rand(100, 2)
# Perform hierarchical clustering
Z = linkage(X, method='ward')
# Plot the dendrogram
plt.figure(figsize=(10, 5))
dendrogram(Z)
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('Data Point')
plt.ylabel('Distance')
plt.show()
In this example, we first generate some random data (X
) consisting of 100 data points with 2 features. We then perform hierarchical clustering using the linkage
function from the scipy.cluster.hierarchy
module. The linkage
function takes the data (X
) and a method for calculating the distance between clusters (in this case, 'ward'
) as arguments. After performing hierarchical clustering, we can visualize the resulting dendrogram using the dendrogram
function from scipy.cluster.hierarchy
. The dendrogram provides a visual representation of hierarchical clustering, showing the merging of clusters and the distances between them. The code then uses matplotlib to create a plot of the dendrogram. You can adjust the figure size, labels, and other properties according to your preferences. Make sure you have scipy and matplotlib installed (pip install scipy matplotlib
) before running this code. Feel free to modify the data and parameters as needed.
Here’s an example of how to perform DBSCAN (Density-Based Spatial Clustering of Applications with Noise) using Python and the scikit-learn library:
from sklearn.cluster import DBSCAN
import numpy as np# Generate some random data for clustering
X = np.random.rand(100, 2)
# Create a DBSCAN instance with the desired parameters
epsilon = 0.3 # neighborhood radius
min_samples = 5 # minimum number of samples in a neighborhood
dbscan = DBSCAN(eps=epsilon, min_samples=min_samples)
# Fit the data to the DBSCAN model
dbscan.fit(X)
# Get the cluster labels for each data point
labels = dbscan.labels_
# Get the number of clusters in the data
n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
# Get the indices of the core samples
core_samples_mask = np.zeros_like(labels, dtype=bool)
core_samples_mask[dbscan.core_sample_indices_] = True
# Print the cluster labels and the number of clusters
print("Cluster labels:", labels)
print("Number of clusters:", n_clusters)
# Print the indices of the core samples
print("Core sample indices:", dbscan.core_sample_indices_)
In this example, we first generate some random data (X
) consisting of 100 data points with 2 features. We then create a DBSCAN
instance with the desired parameters: eps
represents the neighborhood radius, and min_samples
specifies the minimum number of samples required to form a dense region. We fit the data to the DBSCAN
model using the fit
method, which performs the clustering. After clustering, we can access the cluster labels for each data point using the labels_
attribute. We can also get the number of clusters in the data by subtracting 1 from the number of unique labels, accounting for the noise points represented by the label -1. The example also demonstrates how to access the indices of the core samples using the core_sample_indices_
attribute. Core samples are the data points that have enough neighboring points within the specified radius. Feel free to adjust the data and parameters according to your needs. Make sure you have scikit-learn installed (pip install scikit-learn
) before running this code.
Cluster labels: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
Number of clusters: 1
Core sample indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
96 97 98 99]
Here’s an example of how to perform Mean Shift clustering using Python and the scikit-learn library:
from sklearn.cluster import MeanShift
import numpy as np# Generate some random data for clustering
X = np.random.rand(100, 2)
# Create a MeanShift instance with the desired bandwidth
bandwidth = 0.2
meanshift = MeanShift(bandwidth=bandwidth)
# Fit the data to the MeanShift model
meanshift.fit(X)
# Get the cluster labels for each data point
labels = meanshift.labels_
# Get the cluster centers
centers = meanshift.cluster_centers_
# Get the number of clusters
n_clusters = len(np.unique(labels))
# Print the cluster labels and the number of clusters
print("Cluster labels:", labels)
print("Number of clusters:", n_clusters)
# Print the cluster centers
print("Cluster centers:")
for center in centers:
print(center)
In this example, we first generate some random data (X
) consisting of 100 data points with 2 features. We then create a MeanShift
instance with the desired bandwidth parameter. The bandwidth controls the size of the sliding window used for density estimation. We fit the data to the MeanShift
model using the fit
method, which performs clustering. After clustering, we can access the cluster labels for each data point using the labels_
attribute. We can also retrieve the cluster centers using the cluster_centers_
attribute. This example also demonstrates how to get the number of clusters by finding the number of unique labels. Finally, we print the cluster labels, the number of clusters, and the cluster centers. Feel free to adjust the data and parameters according to your needs. Make sure you have scikit-learn installed (pip install scikit-learn
) before running this code.
Cluster labels: [6 6 1 4 2 4 1 2 5 0 5 0 1 1 2 3 3 5 3 1 4 0 1 3 0 3 1 2 1 6 5 5 0 1 4 4 0
1 1 4 2 3 3 3 2 5 1 2 1 1 2 2 0 1 5 2 4 4 4 3 3 5 0 1 1 2 4 5 3 2 3 4 1 0
5 0 3 6 1 0 5 2 3 1 0 1 2 0 0 1 3 3 3 4 0 0 5 4 0 4]
Number of clusters: 7
Cluster centers:
[0.82140341 0.74067251]
[0.61247896 0.75566411]
[0.49908656 0.42077338]
[0.66604656 0.12506737]
[0.42789973 0.11371667]
[0.10057624 0.43322565]
[0.05137859 0.93708931]
Here’s an example of how to perform Gaussian Mixture Models (GMM) clustering using Python and the scikit-learn library:
from sklearn.mixture import GaussianMixture
import numpy as np# Generate some random data for clustering
X = np.random.rand(100, 2)
# Create a GaussianMixture instance with the desired number of components
n_components = 3
gmm = GaussianMixture(n_components=n_components)
# Fit the data to the GMM model
gmm.fit(X)
# Get the cluster labels for each data point
labels = gmm.predict(X)
# Get the probabilities of each data point belonging to each cluster
probs = gmm.predict_proba(X)
# Print the cluster labels and the probabilities
print("Cluster labels:", labels)
print("Probabilities:")
print(probs)
# Print the parameters of the Gaussian distributions
print("Gaussian parameters:")
for i in range(n_components):
print(f"Component {i+1}:")
print("Mean:", gmm.means_[i])
print("Covariance matrix:")
print(gmm.covariances_[i])
print()
In this example, we first generate some random data (X
) consisting of 100 data points with 2 features. We then create a GaussianMixture
instance with the desired number of components (n_components
), which represents the number of clusters or Gaussian distributions. We fit the data to the GMM model using the fit
method, which performs the clustering. After clustering, we can obtain the cluster labels for each data point using the predict
method. We can also get the probabilities of each data point belonging to each cluster using the predict_proba
method. This example also demonstrates how to access the parameters of the Gaussian distributions in the GMM. The means of the Gaussian distributions are accessed via the means_
attribute, and the covariance matrices are accessed via the covariances_
attribute. Feel free to adjust the data and parameters according to your needs. Make sure you have scikit-learn installed (pip install scikit-learn
) before running this code.
Cluster labels: [1 2 2 1 1 2 1 2 0 0 1 0 2 2 0 1 2 0 0 2 2 0 1 0 2 0 2 1 0 1 2 1 1 1 2 1 1
2 2 2 1 0 0 2 1 1 0 0 2 2 2 1 2 2 1 0 1 1 2 1 2 0 2 2 1 2 2 1 1 1 0 2 2 2
1 2 2 0 2 1 1 0 2 0 1 0 0 2 1 0 1 2 1 0 1 0 2 1 0 2]
Probabilities:
[[1.60713738e-02 9.80353553e-01 3.57507285e-03]
[3.52758054e-01 7.13173512e-04 6.46528772e-01]
[1.39943720e-05 4.25336244e-02 9.57452381e-01]
[7.08546194e-04 9.99291454e-01 8.08960490e-18]
[2.45129288e-03 9.97494284e-01 5.44231143e-05]
[2.00063044e-03 1.37666615e-01 8.60332755e-01]
[1.63889449e-03 6.94086898e-01 3.04274208e-01]
[6.25318373e-06 2.40338656e-02 9.75959881e-01]
[9.99938406e-01 6.15465598e-05 4.73423846e-08]
[9.99998925e-01 1.06906820e-06 6.07279882e-09]
[3.24855365e-01 6.75144635e-01 7.54487878e-14]
[9.99995669e-01 4.32826147e-06 3.09112752e-09]
[2.11785250e-02 1.26722158e-02 9.66149259e-01]
[4.15354503e-01 2.18150701e-03 5.82463990e-01]
[9.99999974e-01 2.56834969e-08 9.61410979e-17]
[5.27965400e-03 9.57223870e-01 3.74964757e-02]
[5.02462555e-04 1.47918884e-01 8.51578653e-01]
[9.99999591e-01 4.08746452e-07 4.17679383e-10]
[9.99996356e-01 3.64423147e-06 3.11840622e-19]
[1.74188036e-02 2.33390390e-02 9.59242157e-01]
[5.44176579e-04 1.51655058e-01 8.47800766e-01]
[9.27600229e-01 7.23953881e-02 4.38327257e-06]
[1.13766175e-02 9.88623382e-01 5.30647379e-16]
[9.99999904e-01 9.62681503e-08 4.57566071e-18]
[3.92758252e-04 3.62275384e-01 6.37331858e-01]
[9.98380506e-01 1.61949383e-03 1.33493801e-16]
[3.87421828e-02 3.90741058e-04 9.60867076e-01]
[6.09594662e-02 9.39037333e-01 3.20056697e-06]
[8.00915039e-01 1.99074648e-01 1.03130858e-05]
[3.50012762e-03 9.96290344e-01 2.09528237e-04]
[1.99728089e-03 2.55639118e-01 7.42363601e-01]
[3.73431282e-01 6.26034540e-01 5.34177542e-04]
[1.49851829e-03 9.98498151e-01 3.33119091e-06]
[2.25761487e-02 9.77207746e-01 2.16105045e-04]
[4.25585074e-02 3.55052916e-05 9.57405987e-01]
[1.81114001e-01 7.81880835e-01 3.70051639e-02]
[5.09291484e-03 9.94905515e-01 1.56994103e-06]
[2.90536065e-03 2.04813837e-01 7.92280803e-01]
[1.98649924e-03 1.46759659e-01 8.51253841e-01]
[8.54223241e-02 6.23394733e-04 9.13954281e-01]
[1.76063050e-01 5.78858817e-01 2.45078134e-01]
[9.99994295e-01 5.70473544e-06 3.53879465e-14]
[9.99999991e-01 8.76652877e-09 2.26160360e-17]
[4.06462861e-02 6.82438543e-04 9.58671275e-01]
[1.41083384e-04 9.99858917e-01 1.91975149e-18]
[5.03477636e-03 9.94965200e-01 2.32906515e-08]
[5.86644570e-01 3.65998610e-01 4.73568195e-02]
[9.99606512e-01 2.86039417e-07 3.93202432e-04]
[1.73588488e-03 1.43722747e-01 8.54541368e-01]
[1.18927624e-02 3.95666956e-02 9.48540542e-01]
[2.83335100e-02 6.87304291e-03 9.64793447e-01]
[7.04700817e-05 9.99929530e-01 5.68911245e-20]
[2.81780578e-01 2.09369854e-05 7.18198485e-01]
[4.97296088e-01 7.23375348e-05 5.02631575e-01]
[1.85634325e-03 9.98143657e-01 6.97216310e-14]
[9.35861551e-01 6.41384139e-02 3.53635022e-08]
[1.50559763e-04 9.99839494e-01 9.94650567e-06]
[2.05241293e-04 9.91855090e-01 7.93966895e-03]
[1.85911102e-01 5.07964099e-05 8.14038102e-01]
[1.62881319e-04 9.99837119e-01 1.54666430e-10]
[5.65908642e-03 3.09738726e-01 6.84602187e-01]
[6.73050492e-01 2.57580602e-01 6.93689062e-02]
[2.05934081e-04 1.53518513e-01 8.46275553e-01]
[5.36309938e-03 2.41617740e-01 7.53019161e-01]
[3.13914609e-04 8.96702725e-01 1.02983361e-01]
[4.81662343e-04 3.04827352e-01 6.94690985e-01]
[3.44937421e-03 2.15705466e-01 7.80845160e-01]
[1.64588538e-03 9.93055231e-01 5.29888375e-03]
[8.28621658e-03 9.87116122e-01 4.59766146e-03]
[6.43141238e-05 9.99935686e-01 8.17658964e-18]
[8.09922454e-01 1.90077493e-01 5.34373756e-08]
[1.96891227e-02 1.21241981e-02 9.68186679e-01]
[2.56665653e-05 7.07921243e-02 9.29182209e-01]
[1.32426807e-03 1.47753182e-01 8.50922550e-01]
[1.14846613e-02 9.88515339e-01 1.44392461e-10]
[2.90746040e-01 5.45387467e-04 7.08708572e-01]
[2.11593156e-04 1.26811166e-01 8.72977241e-01]
[9.61102324e-01 3.77023313e-02 1.19534481e-03]
[5.59823461e-02 9.27770333e-02 8.51240621e-01]
[1.08409891e-03 9.97945294e-01 9.70607025e-04]
[4.12970999e-03 9.95868640e-01 1.64989214e-06]
[9.99999815e-01 4.16809530e-08 1.43115148e-07]
[2.06257034e-02 2.14923873e-02 9.57881909e-01]
[9.97476091e-01 2.52390911e-03 8.37031272e-14]
[1.78778496e-02 9.82115625e-01 6.52574878e-06]
[9.92301459e-01 1.22354067e-06 7.69731711e-03]
[9.99766224e-01 5.22216474e-06 2.28554317e-04]
[5.20984659e-04 1.31730304e-01 8.67748711e-01]
[2.53750873e-02 9.72269390e-01 2.35552267e-03]
[9.99333344e-01 3.18098206e-04 3.48557584e-04]
[9.91302158e-05 9.99900870e-01 1.15483533e-11]
[9.42861397e-02 8.00807738e-03 8.97705783e-01]
[3.28797815e-02 9.67120218e-01 8.10563500e-10]
[9.97983773e-01 3.61543153e-05 1.98007300e-03]
[1.17319893e-03 7.51374297e-01 2.47452504e-01]
[9.99999989e-01 1.08371334e-08 2.15018895e-11]
[2.36699433e-03 1.21987305e-01 8.75645701e-01]
[1.65383242e-02 9.83460706e-01 9.69784327e-07]
[9.99524329e-01 4.75670995e-04 8.77473110e-18]
[2.53780631e-05 6.35795423e-02 9.36395080e-01]]
Gaussian parameters:
Component 1:
Mean: [0.2312213 0.65857719]
Covariance matrix:
[[0.02733027 0.0024876 ]
[0.0024876 0.0445153 ]]Component 2:
Mean: [0.74017234 0.55330453]
Covariance matrix:
[[0.02108665 0.01651411]
[0.01651411 0.05102887]]
Component 3:
Mean: [0.46601434 0.14959356]
Covariance matrix:
[[ 0.06884548 -0.0014402 ]
[-0.0014402 0.00773686]]
Here’s an example of how to perform Spectral Clustering using Python and the scikit-learn library:
from sklearn.cluster import SpectralClustering
import numpy as np# Generate some random data for clustering
X = np.random.rand(100, 2)
# Create a SpectralClustering instance with the desired parameters
n_clusters = 3
spectral_clustering = SpectralClustering(n_clusters=n_clusters)
# Fit the data to the SpectralClustering model
spectral_clustering.fit(X)
# Get the cluster labels for each data point
labels = spectral_clustering.labels_
# Print the cluster labels
print("Cluster labels:", labels)
In this example, we first generate some random data (X
) consisting of 100 data points with 2 features. We then create a SpectralClustering
instance with the desired number of clusters (n_clusters
). We fit the data to the SpectralClustering
model using the fit
method, which performs the clustering. After clustering, we can access the cluster labels for each data point using the labels_
attribute. The example then prints the cluster labels. Feel free to adjust the data and parameters according to your needs. Make sure you have scikit-learn installed (pip install scikit-learn
) before running this code.
Cluster labels: [0 0 2 0 1 2 2 2 0 0 2 1 2 2 0 1 0 1 0 0 0 0 1 0 1 0 1 2 0 1 0 2 2 0 2 2 2
1 0 2 2 0 2 2 1 1 0 2 1 2 0 2 1 0 0 2 1 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 2 2
2 1 2 0 2 2 1 1 2 0 2 1 1 1 0 1 0 2 1 1 2 2 0 1 1 0]
Here’s an example of how to perform Agglomerative Clustering using Python and the scikit-learn library:
from sklearn.cluster import AgglomerativeClustering
import numpy as np# Generate some random data for clustering
X = np.random.rand(100, 2)
# Create an AgglomerativeClustering instance with the desired parameters
n_clusters = 3
agglomerative_clustering = AgglomerativeClustering(n_clusters=n_clusters)
# Fit the data to the AgglomerativeClustering model
agglomerative_clustering.fit(X)
# Get the cluster labels for each data point
labels = agglomerative_clustering.labels_
# Print the cluster labels
print("Cluster labels:", labels)
In this example, we first generate some random data (X
) consisting of 100 data points with 2 features. We then create an AgglomerativeClustering
instance with the desired number of clusters (n_clusters
). We fit the data to the AgglomerativeClustering
model using the fit
method, which performs the clustering. After clustering, we can access the cluster labels for each data point using the labels_
attribute. The example then prints the cluster labels. Feel free to adjust the data and parameters according to your needs. Make sure you have scikit-learn installed (pip install scikit-learn
) before running this code.
Cluster labels: [0 1 1 0 1 0 0 0 0 0 0 1 2 0 1 0 0 2 0 2 0 0 2 1 0 2 1 1 2 0 2 0 1 1 1 1 1
0 0 1 0 1 2 2 0 1 1 2 1 0 1 0 1 0 1 2 0 1 0 2 1 1 1 0 1 1 1 0 0 0 2 1 1 2
0 0 1 2 1 1 2 0 1 0 0 0 0 0 1 2 1 0 2 0 2 0 2 0 0 2]