Skip to content Skip to sidebar Skip to footer

Finding The Optimal Number Of Clusters Using The Elbow Method And K- Means Clustering

I am writing a program for which I need to apply K-means clustering over a data set of some >200, 300-element arrays. Could someone provide me with a link to code with explanat

Solution 1:

As an addition to Roohollah's answer: Please notice that the elbow method used to find the optimal number of clusters for K-Means is purely visual and the results may be ambiguous. Therefore, you may want to combine it with the silhouette analysis as described, for example, in the following articles: Choosing the appropriate number of clusters (RealPython), Silhouette method - including an implementation example in Python (TowardsDataScience), Silhouette analysis example (Scikit-learn), Silhouette (Wikipedia).

Solution 2:

Suppose there are 12 samples each with two features as below:

data=np.array([[1,1],[1,2],[2,1.5],[4,5],[5,6],[4,5.5],[5,5],[8,8],[8,8.5],[9,8],[8.5,9],[9,9]])

You can find the optimal number of clusters using elbow method and the centers of clusters as the following example:

import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

data=np.array([[1,1],[1,2],[2,1.5],[4,5],[5,6],[4,5.5],[5,5],[8,8],[8,8.5],[9,8],[8.5,9],[9,9]])

deffind_best_k():
    sum_of_squared_distances = []
    K=range(1,8) # change 8 in your data for k in K:
        km=KMeans(n_clusters=k)
        km=km.fit(data)
        sum_of_squared_distances.append(km.inertia_)
    plt.plot(K, sum_of_squared_distances, 'bx-')
    plt.xlabel('k')
    plt.ylabel('sum_of_squared_distances')
    plt.title('Elbow method for optimal k')
    plt.show()  
    #The plot looks like an arm, and the elbow on the arm is optimal k.# step 1: find optimal k (number of clusters)
find_best_k()

defrun_kmeans(k,data): # k is the optimal number of clusters
    km=KMeans(n_clusters=k) 
    km=km.fit(data)
    centroids = km.cluster_centers_  #get the center of clusters#print(centroids)return centroids

defplotresults():
    centroids=run_kmeans(3,data)     
    plt.plot(data[0:3,0],data[0:3,1],'ro',data[3:7,0],data[3:7,1],'bo',data[7:12,0],data[7:12,1],'go')
    for i inrange(3):
        plt.plot(centroids[i,0],centroids[i,1],'k*')
        plt.text(centroids[i,0],centroids[i,1], "c"+str(i), fontsize=12)
plotresults()

The elbow plot:

enter image description here

The results:

enter image description here

Hope this helps.

Post a Comment for "Finding The Optimal Number Of Clusters Using The Elbow Method And K- Means Clustering"