Unfreezing Orange

Have you ever tried Orange with data big enough that some widgets ran for more than a second? Then you have seen it: Orange froze. While the widget was processing, the interface would not respond to any inputs, and there was no way to stop that widget.

Not all the widgets freeze, though! Some widgets, like Test & Score, k-Means, or Image Embedding, do not block. While they are working, we are free to build other parts of the workflow, and these widgets also show their progress. Some, like Image Embedding, which work with lots of images, even allow interruptions.

Why does Orange freeze? Most widgets process users’ actions directly: after an event (click, pressed key, new input data) some code starts running: until it finishes, the interface can not respond to any new events. This is a reasonable approach for short tasks, such as making a selection in a Scatter Plot. But with longer tasks, such as building a Support Vector Model on big data, Orange gets unresponsive.

To make Orange responsive while it is processing, we need to start the task in a new thread. As programmers we have to consider the following:
1. Starting the task. We have to make sure that other (older) tasks are not running.
2. Showing results when the task has finished.
3. Periodic communication between the task and the interface for status reports (progress bars) and task stopping.

Starting the task and showing the results are straightforward and well documented in a tutorial for writing widgets. Periodic communication with stopping is harder: it is completely task-dependent and can be either trivial, hard, or even impossible. Periodic communication is, in principle, unessential for responsiveness, but if we do not implement it, we will be unable to stop the running task and progress bars would not work either.

Taking care of periodic communication was the hardest part of making the Neural Network widget responsive. It would have been easy, had we implemented neural networks ourselves. But we use the scikit-learn implementation, which does not expose an option to make additional function calls while fitting the network (we need to run code that communicates with the interface). We had to resort to a trick: we modified fitting so that a change to an attribute called n_iters_ called a function (see pull request). Not the cleanest solution, but it seems to work.

For now, only a few widgets work so that the interface remains responsive. We are still searching for the best way to make existing widgets behave nicely, but responsiveness is now one of our priorities.

Orange 2.5: code conversion

Orange 2.5 unifies Orange’s C++ core and Python modules into a single module hierarchy. To use the new module hierarchy, import Orange instead of orange and accompanying orng* modules. While we will maintain backward compatibility in 2.* releases, we nevertheless suggest programmers to use the new interface. The provided conversion tool can help refactor your code to use the new interface.

The conversion script, orange2to25.py, resides in Orange’s main directory. To refactor accuracy8.py from the “Orange for beginners” tutorial runpython orange2to25.py -w -o accuracy8_25.py doc/ofb-rst/code/accuracy8.py.

The old code

import orange
import orngTest, orngStat, orngTree

# set up the learners
bayes = orange.BayesLearner()
tree = orngTree.TreeLearner(mForPruning=2)
bayes.name = "bayes"
tree.name = "tree"
learners = [bayes, tree]

# compute accuracies on data
data = orange.ExampleTable("voting")
res = orngTest.crossValidation(learners, data, folds=10)
cm = orngStat.computeConfusionMatrices(res,
        classIndex=data.domain.classVar.values.index('democrat'))

is refactored to

import Orange

# set up the learners
bayes = Orange.classification.bayes.NaiveLearner()
tree = Orange.classification.tree.TreeLearner(mForPruning=2)
bayes.name = "bayes"
tree.name = "tree"
learners = [bayes, tree]

# compute accuracies on data
data = Orange.data.Table("voting")
res = Orange.evaluation.testing.cross_validation(learners, data, folds=10)
cm = Orange.evaluation.scoring.compute_confusion_matrices(res,
        classIndex=data.domain.classVar.values.index('democrat'))

Read more about the refactoring tool on the wiki and on the help page (python orange2to25.py --help).

Orange at ISMB/ECCB 2011

We presented the Orange Bioinformatics add-on at the ISMB/ECCB conference in Vienna, a joined event covering both 19th Annual International Conference on Intelligent Systems for Molecular Biology and 10th European Conference on Computational Biology.

We were giving out Orange stickers (with the URL) to the poster’s visitors. There was some interest; in the end we gave out about 10 of them, mostly to biologists, who were excited to perform some of the analysis themselves. Among the visitors was also a developer of a similar tool who seemed slightly surprised that something like this already exists, while another was disappointed because Orange only runs locally.

See the poster in action on the photo taken by Gregor Rot.

Orange poster (PNG)Poster in action

Orange 2.5 progress

We decided that we should reorganize Orange to provide more intuitive interface to the scripting interface. The next release, Orange 2.5 is getting better every day. But fear not, dear reader, we are working hard to ensure that your scripts will still work.

In the last morning of the camp in Bohinj we decided to use undercase_separated names instead of CamelCase. We have been steadily converting them and the deprecation utilities by Aleš help a lot. We just list the name changes for class attributes or arguments and their renaming is handled gracefully; they also remain accessible with the old names. Therefore, the code does not need to be duplicated to ensure backwards compatibility.

A simple example from the documentation of bagging and boosting. The old version first:

import orange, orngEnsemble, orngTree
import orngTest, orngStat

tree = orngTree.TreeLearner(mForPruning=2, name="tree")
bs = orngEnsemble.BoostedLearner(tree, name="boosted tree")
bg = orngEnsemble.BaggedLearner(tree, name="bagged tree")

data = orange.ExampleTable("lymphography.tab")

learners = [tree, bs, bg]
results = orngTest.crossValidation(learners, data, folds=3)
print "Classification Accuracy:"
for i in range(len(learners)):
    print ("%15s: %5.3f") % (learners[i].name, orngStat.CA(results)[i])

Orange 2.5 version:

import Orange

tree = Orange.classification.tree.TreeLearner(m_pruning=2, name="tree")
bs = Orange.ensemble.boosting.BoostedLearner(tree, name="boosted tree")
bg = Orange.ensemble.bagging.BaggedLearner(tree, name="bagged tree")

table = Orange.data.Table("lymphography.tab")

learners = [tree, bs, bg]
results = Orange.evaluation.testing.cross_validation(learners, table, folds=3)
print "Classification Accuracy:"
for i in range(len(learners)):
    print ("%15s: %5.3f") % (learners[i].name, Orange.evaluation.scoring.CA(results)[i])

In new Orange we only need to import a single module, Orange, the root of the new hierarchy.

Data loading speedups

Orange has been loading data faster since the end of February, especially if there are many attributes in the file.

Quick comparisons between the old new versions, measured on my computer:

  • adult.tab (32561 examples, 15 attributes): old version = 1.41s, new version = 0.86s.
  • DLBCL.tab (77 examples, 7071 attributes): old version = 2.72s, new version = 0.93s.
  • GDS1962.tab (104 examples, 31837 attributes): old version = 33.5s, new version = 6.6s.

The speedups were obtained with:

  • reuse of a buffer for parsing,
  • skipping type detection for attributes with known types, and
  • by keeping attributes in a different data structure internally.