k-Means & Silhouette Score

k-Means is one of the most popular unsupervised learning algorithms for finding interesting groups in our data. It can be useful in customer segmentation, finding gene families, determining document types, improving human resource management and so on.

But… have you ever wondered how k-means works? In the following three videos we explain how to construct a data analysis workflow using k-means, how k-means works, how to find a good k value and how silhouette score can help us find the inliers and the outliers.

 

#1 Constructing workflow with k-means

#2 How k-means works [interactive visualization]

#3 How silhouette score works and why it is useful

  • Diana E Bedolla

    Hi, I’m trying to do some K-means on a big matrix 32768 instances, and it is crashing every time. It is saying that Silhouette scores are not computed for >5000 samples. Although I am giving a fixed number of clusters. Does anybody know a way of overcoming this problem??

  • michael lange

    Couldn’t agree more. Well done.

  • Ivan Jarpa Manríquez

    Congratulations, best data mining software ever. Is really usefull for business data mining. Im a Orange Fan