k-Means & Silhouette Score

k-Means is one of the most popular unsupervised learning algorithms for finding interesting groups in our data. It can be useful in customer segmentation, finding gene families, determining document types, improving human resource management and so on.

But… have you ever wondered how k-means works? In the following three videos we explain how to construct a data analysis workflow using k-means, how k-means works, how to find a good k value and how silhouette score can help us find the inliers and the outliers.

 

#1 Constructing workflow with k-means

#2 How k-means works [interactive visualization]

#3 How silhouette score works and why it is useful

8 thoughts on “k-Means & Silhouette Score

  1. Hi, I’m trying to do some K-means on a big matrix 32768 instances, and it is crashing every time. It is saying that Silhouette scores are not computed for >5000 samples. Although I am giving a fixed number of clusters. Does anybody know a way of overcoming this problem??

        1. Daniel, you are not setting a fixed number of clusters. As the warning says, silhouette scores won’t be computed for more than 5000 instances. This is expected and normal.

Leave a Reply