Paint Your Data

One of the widgets I very much enjoy when teaching introductory coursein data mining is Paint Data widget. In the data I would paint in thiswidget I would intentionally include some clusters, or intentionallyobscure them. Or draw them in any strange shape. Then I would discusswith students if these clusters are identified by k-means clustering. Or by hierarchical clustering. We would also discussautomatic scoring of the quality of clusters, come up with the idea ofa silhouette (ok, already invented, but helps if you get this idea onyour own as well). And then we would play with various data sets andclustering techniques and their parameters in Orange.

Like in the following workflow where I drew three clusters that wereindeed recognized by k-means clustering. Notice that silhouettescoring correctly identified even the number of clusters was guessedcorrectly. And I also drawn the clustered data in the Scatterplot tocheck if the clusters are indeed where they should be.


Or like in the workflow below where k-means fails miserably (but someother clustering technique would not).


Paint Data can also be used in supervised setting, for classificationtasks. We can set the intended number of classes, and then chose anyof these to paint its data. Below I have used it to create the datasets to check the behavior of several classifiers.


There are tons of other workflows where Paint Data can be useful. Giveit a try!