Multi-label classification (and Multi-target prediction) in Orange

The last summer, student Wencan Luo participated in Google Summer of Code to implement Multi-label Classification in Orange. He provided a framework, implemented a few algorithms and some prototype widgets. His work has been “hidden” in our repositories for too long; finally, we have merged part of his code into Orange (widgets are not there yet …) and added a more general support for multi-target prediction.

You can load multi-label tab-delimited data (e.g. just like any other tab-delimited data:

>>> zoo ='zoo')            # single-target
>>> emotions ='emotions')  # multi-label

The difference is that now zoo‘s domain has a non-empty class_var field, while a list of emotions‘ labels can be obtained through it’s domain’s class_vars:

>>> zoo.domain.class_var
EnumVariable 'type'
>>> emotions.domain.class_vars
<EnumVariable 'amazed-suprised',
 EnumVariable 'happy-pleased',
 EnumVariable 'relaxing-calm',
 EnumVariable 'quiet-still',
 EnumVariable 'sad-lonely',
 EnumVariable 'angry-aggresive'>

A simple example of a multi-label classification learner is a “binary relevance” learner. Let’s try it out.

>>> learner = Orange.multilabel.BinaryRelevanceLearner()
>>> classifier = learner(emotions)
>>> classifier(emotions[0])
[<orange.Value 'amazed-suprised'='0'>,
 <orange.Value 'happy-pleased'='0'>,
 <orange.Value 'relaxing-calm'='1'>,
 <orange.Value 'quiet-still'='1'>,
 <orange.Value 'sad-lonely'='1'>,
 <orange.Value 'angry-aggresive'='0'>]
>>> classifier(emotions[0], Orange.classification.Classifier.GetProbabilities)
[<1.000, 0.000>, <0.881, 0.119>, <0.000, 1.000>,
 <0.046, 0.954>, <0.000, 1.000>, <1.000, 0.000>]

Real values of label variables of emotions[0] instance can be obtained by calling emotions[0].get_classes(), which is analogous to the get_class method in the single-target case.

For multi-label classification, we can also perform testing like usual, however, specialised evaluation measures have to be used:

>>> test = Orange.evaluation.testing.cross_validation([learner], emotions)
>>> Orange.evaluation.scoring.mlc_hamming_loss(test)

In one of the following blog posts, a multi-target regression method PLS that is in the process of implementation will be described.

New Orange icons

As new and new widgets with new features are added to Orange, icons for them have to be drawn. Most of the time those are just some quick sketches or even missing altogether. But now we are starting to redraw and unify them. A few of them have already been made.

Feature ConstructorImage ViewerPaint DataPreprocessPython ScriptSVM

Parallel Orange?

We attended a NIPS 2011 workshop on processing and learning from large scale data. Various presenters showed different tools and frameworks that can be used when developing algorithms suitable for dealing with large scale data, but none of them were written in Python and as such, not useful for Orange. We have been looking for a framework that would help us run code in parallel for some time, but so far with no luck.

We would like to have a framework that is easy to use, can be used in C as well as in Python and supports multi-level map reduce (cross validation can be viewed as map reduce and random forest that is tested is another map-reduce). Prototypes we have created so far solve this problem by inspecting learners that are used in cross-validation and creating all “subtasks” at the same time. That results in really ugly code we don’t want to commit ;). If you know a framework that would suit our needs, want to implement support for parallel computation by yourself (we will apply to GSoC) or have an idea how to solve this problem, feel free to contact us ;).