Scatter Plot Projection Rank

One of the nicest and surely most useful visualization widgets in Orange is Scatter Plot. The widget displays a 2-D plot, where x and y-axes are two attributes from the data.

2-dimensional scatter plot visualization
2-dimensional scatter plot visualization

 

Orange 2.7 has a wonderful functionality called VizRank, that is now implemented also in Orange 3. Rank Projections functionality enables you to find interesting attribute pairs by scoring their average classification accuracy. Click ‘Start Evaluation’ to begin ranking.

Rank Projections before ranking is performed.
Rank Projections before ranking is performed.

 

The functionality will also instantly adapt the visualization to the best scored pair. Select other pairs from the list to compare visualizations.

Rank Projections once the attribute pairs are scored.
Rank Projections once the attribute pairs are scored.

 

Rank suggested petal length and petal width as the best pair and indeed, the visualization below is much clearer (better separated).

Scatter Plot once the visualization is optimized.
Scatter Plot once the visualization is optimized.

 

Have fun trying out this and other visualization widgets!

Visualizing Misclassifications

In data mining classification is one of the key methods for making predictions and gaining important information from our data. We would, for example, use classification for predicting which patients are likely to have the disease based on a given set of symptoms.

In Orange an easy way to classify your data is to select several classification widgets (e.g. Naive Bayes, Classification Tree and Linear Regression), compare the prediction quality of each learner with Test Learners and Confusion Matrix and then use the best performing classifier on a new data set for classification. Below we use Iris data set for simplicity, but the same procedure works just as well on all kinds of data sets.

Here we have three confusion matrices for Naive Bayes (top), Classification Tree (middle) and Logistic Regression (bottom).

 

Three misclassification matrices (Naive Bayes, Classification Tree and Logistic Regression)
Three misclassification matrices (Naive Bayes, Classification Tree and Logistic Regression)

 

We see that Classification Tree did the best with only 9 misclassified instances. To see which instances were assigned a false class, we select ‘Misclassified’ option in the widget, which highlights misclassifications and feeds them to the Scatter Plot widget. In the graph we thus see the entire data set presented with empty dots and the selected misclassifications with full dots.

Visualization of misclassified instances in scatter plot.
Visualization of misclassified instances in scatter plot.

 

Feel free to switch between learners in Confusion Matrix to see how the visualization changes for each of them.