Finding The Optimal Number Of Clusters Using The Elbow Method And K- Means Clustering
I am writing a program for which I need to apply K-means clustering over a data set of some >200, 300-element arrays. Could someone provide me with a link to code with explanat
Solution 1:
As an addition to Roohollah's answer: Please notice that the elbow method used to find the optimal number of clusters for K-Means is purely visual and the results may be ambiguous. Therefore, you may want to combine it with the silhouette analysis as described, for example, in the following articles: Choosing the appropriate number of clusters (RealPython), Silhouette method - including an implementation example in Python (TowardsDataScience), Silhouette analysis example (Scikit-learn), Silhouette (Wikipedia).
Solution 2:
Suppose there are 12 samples each with two features as below:
data=np.array([[1,1],[1,2],[2,1.5],[4,5],[5,6],[4,5.5],[5,5],[8,8],[8,8.5],[9,8],[8.5,9],[9,9]])
You can find the optimal number of clusters using elbow method and the centers of clusters as the following example:
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
data=np.array([[1,1],[1,2],[2,1.5],[4,5],[5,6],[4,5.5],[5,5],[8,8],[8,8.5],[9,8],[8.5,9],[9,9]])
deffind_best_k():
sum_of_squared_distances = []
K=range(1,8) # change 8 in your data for k in K:
km=KMeans(n_clusters=k)
km=km.fit(data)
sum_of_squared_distances.append(km.inertia_)
plt.plot(K, sum_of_squared_distances, 'bx-')
plt.xlabel('k')
plt.ylabel('sum_of_squared_distances')
plt.title('Elbow method for optimal k')
plt.show()
#The plot looks like an arm, and the elbow on the arm is optimal k.# step 1: find optimal k (number of clusters)
find_best_k()
defrun_kmeans(k,data): # k is the optimal number of clusters
km=KMeans(n_clusters=k)
km=km.fit(data)
centroids = km.cluster_centers_ #get the center of clusters#print(centroids)return centroids
defplotresults():
centroids=run_kmeans(3,data)
plt.plot(data[0:3,0],data[0:3,1],'ro',data[3:7,0],data[3:7,1],'bo',data[7:12,0],data[7:12,1],'go')
for i inrange(3):
plt.plot(centroids[i,0],centroids[i,1],'k*')
plt.text(centroids[i,0],centroids[i,1], "c"+str(i), fontsize=12)
plotresults()
The elbow plot:
The results:
Hope this helps.
Post a Comment for "Finding The Optimal Number Of Clusters Using The Elbow Method And K- Means Clustering"