Scatter Plot Projection Rank

One of the nicest and surely most useful visualization widgets in Orange is Scatter Plot. The widget displays a 2-D plot, where x and y-axes are two attributes from the data.

2-dimensional scatter plot visualization
2-dimensional scatter plot visualization

 

Orange 2.7 has a wonderful functionality called VizRank, that is now implemented also in Orange 3. Rank Projections functionality enables you to find interesting attribute pairs by scoring their average classification accuracy. Click ‘Start Evaluation’ to begin ranking.

Rank Projections before ranking is performed.
Rank Projections before ranking is performed.

 

The functionality will also instantly adapt the visualization to the best scored pair. Select other pairs from the list to compare visualizations.

Rank Projections once the attribute pairs are scored.
Rank Projections once the attribute pairs are scored.

 

Rank suggested petal length and petal width as the best pair and indeed, the visualization below is much clearer (better separated).

Scatter Plot once the visualization is optimized.
Scatter Plot once the visualization is optimized.

 

Have fun trying out this and other visualization widgets!

Classifying instances with Orange in Python

Last week we showed you how to create your own data table in Python shell. Now we’re going to take you a step further and show you how to easily classify data with Orange.

First we’re going to create a new data table with 10 fruits as our instances.

import Orange
from Orange.data import *

color = DiscreteVariable("color", values=["orange", "green", "yellow"])
calories = ContinuousVariable("calories")
fiber = ContinuousVariable("fiber")
fruit = DiscreteVariable("fruit", values=["orange", "apple", "peach"])

domain = Domain([color, calories, fiber], class_vars=fruit)

data=Table(domain, [
["green", 4, 1.2, "apple"], 
["orange", 5, 1.1, "orange"],
["yellow", 4, 1.0, "peach"],
["orange", 4, 1.1, "orange"],
["yellow", 4, 1.1,"peach"],
["green", 5, 1.3, "apple"],
["green", 4, 1.3, "apple"],
["orange", 5, 1.0, "orange"],
["yellow", 4.5, 1.3, "peach"],
["green", 5, 1.0, "orange"]])

print(data)

Now we have to select a model for classification. Among the many learners in Orange library, we decided to use the Tree Learner for this example. Since we’re dealing with fruits, we thought it’s only appropriate. 🙂

Let’s create a learning algorithm and use it to induce the classifier from the data.

tree_learner = Orange.classification.TreeLearner()
tree = tree_learner(data)

Now we can predict what variety a green fruit with 3.5 calories and 2g of fiber is with the help of our model. To do this, simply call the model and use a list of new data as argument.

print(tree(["green", 3.5, 2]))

Python returns index as a result:

1

To check the index, we can call class variable values with the corresponding index:

domain.class_var.values[1]

Final result:

"apple"

You can use your own data set to see how this model works for different data types. Let us know how it goes! 🙂

Creating a new data table in Orange through Python

IMPORT DATA

 

One of the first tasks in Orange data analysis is of course loading your data. If you are using Orange through Python, this is as easy as riding a bike:

import Orange
data = Orange.data.Table(“iris”)
print (data)

This will return a neat data table of the famous Iris data set in the console.

 

CREATE YOUR OWN DATA TABLE

 

What if you want to create your own data table from scratch? Even this is surprisingly simple. First, import the Orange data library.

from Orange.data import *

 

Set all the attributes you wish to see in your data table. For discrete attributes call DiscreteVariable and set the name and the possible values, while for a continuous variable call ContinuousVariable and set only the attribute name.

color = DiscreteVariable(“color”, values=[“orange”, “green”, “yellow”])

calories = ContinuousVariable(“calories”)

fiber = ContinuousVariable(“fiber”)]

fruit = DiscreteVariable("fruit”, values=[”orange", “apple”, “peach”])

 

Then set the domain for your data table. See how we set class variable with class_vars?

domain = Domain([color, calories, fiber], class_vars=fruit)

 

Time to input your data!

data = Table(domain, [

[“green”, 4, 1.2, “apple”],

["orange", 5, 1.1, "orange"],

["yellow", 4, 1.0, "peach"]])

 

And now print what you have created!

print(data)

 

One final step:

Table.save(table, "fruit.tab")

 

Your data is safely stored to your computer (in the Python folder)! Good job!