Skip to content Skip to sidebar Skip to footer

Is Sklearn.cluster.kmeans Sensative To Data Point Order?

As noted in the answer to this post about feature scaling, some(all?) implementations of KMeans are sensitive to the order of features data points. Based on the sklearn.cluster.KMe

Solution 1:

K-means is not sensitive to feature order.

The post you refer to taken about scale, not order.

If you look at the kmeans equations, it should be obvious that the order does not matter.

There has been research (van Luxbourg, if I recall correctly) that essentially says that if there is a good kmeans result, then it must be easy to find. If you get very different results when running kmeans multiple times, then none of the results is good.

There are "n choose k" possible initializations. While they can't be all bad, n_iter will only try very few of them. So there is no guarantee to find the "best".the function will return the one with lowest SSQ, but that does not mean this is the most useful result in the end, unless you only care about SSQ.

Post a Comment for "Is Sklearn.cluster.kmeans Sensative To Data Point Order?"