Visualizing multiple variables: FreeViz

Scatter plots are great! But sometimes, we need to plot more than two variables to truly understand the data. How can we achieve this, knowing humans can only grasp up to three dimensions? With an optimization of linear projection, of course!

Orange recently re-introduced FreeViz, an interactive visualization for plotting multiple variables on a 2-D plane.

Let’s load zoo.tab data with File widget and connect FreeViz to it. Zoo data has 16 features describing animals of different types – mammals, amphibians, insects and so on. We would like to use FreeViz to show us informative features and create a visualization that separates well between animal types.

FreeViz with initial, un-optimized plot.

We start with un-optimized projection, where data points are scattered around features axes. Once we click Optimize, we can observe optimization process in real-time and at the end see the optimized projection.

FreeViz with optimized projection.

This projection is much more informative. Mammals are nicely grouped together within a pink cluster that is characterized by hair, milk, and toothed features. Conversely, birds are charaterized by eggs, feathers and airborne, while fish are aquatic. Results are as expected, which means optimization indeed found informative features for each class value.

FreeViz with Show class density option.

Since we are working with categorical class values, we can tick Show class density to color the plot by majority class values. We can also move anchors around to see how data points change in relation to a selected anchor.

Finally, as in most Orange visualizations, we can select a subset of data points and explore them further. For example, let us observe which amphibians are characterized by being aquatic in a Data Table. A newt, a toad and two types of frogs, one venomous and one not.

Data exploration is always much easier with clever visualizations!

Stack Everything!

We all know that sometimes many is better than few. Therefore we are happy to introduce the Stack widget. It is available in Prototypes add-on for now.

Stacking enables you to combine several trained models into one meta model and use it in Test&Score just like any other model. This comes in handy with complex problems, where one classifier might fail, but many could come up with something that works. Let’s see an example.

We start with something as complex as this. We used Paint Data to create a complex data set, where classes somewhat overlap. This is naturally an artificial example, but you can try the same on your own, real life data.

We used 4 classes and painted a complex, 2-dimensional data set.

 

Then we add several kNN models with different parameters, say 5, 10 and 15 neighbors. We connect them to Test&Score and use cross validation to evaluate their performance. Not bad, but can we do even better?

Scores without staking, using only 3 different kNN classifiers.

 

Let us try stacking. We will connect all three classifiers to the Stacking widget and use Logistic Regression as an aggregate, a method that aggregates the three models into a single meta model. Then we connect connect the stacked model into Test&Score and see whether our scores improved.

Scores with stacking. Stack reports on improved performance.

 

And indeed they have. It might not be anything dramatic, but in real life, say medical context, even small improvements count. Now go and try the procedure on your own data. In Orange, this requires only a couple of minutes.

Final workflow with channel names. Notice that Logistic Regression is used as Aggregate, not a Learner.