Orange 2.5: code conversion

Orange 2.5 unifies Orange’s C++ core and Python modules into a single module hierarchy. To use the new module hierarchy, import Orange instead of orange and accompanying orng* modules. While we will maintain backward compatibility in 2.* releases, we nevertheless suggest programmers to use the new interface. The provided conversion tool can help refactor your code to use the new interface.

The conversion script, orange2to25.py, resides in Orange’s main directory. To refactor accuracy8.py from the “Orange for beginners” tutorial runpython orange2to25.py -w -o accuracy8_25.py doc/ofb-rst/code/accuracy8.py.

The old code

import orange
import orngTest, orngStat, orngTree

# set up the learners
bayes = orange.BayesLearner()
tree = orngTree.TreeLearner(mForPruning=2)
bayes.name = "bayes"
tree.name = "tree"
learners = [bayes, tree]

# compute accuracies on data
data = orange.ExampleTable("voting")
res = orngTest.crossValidation(learners, data, folds=10)
cm = orngStat.computeConfusionMatrices(res,
        classIndex=data.domain.classVar.values.index('democrat'))

is refactored to

import Orange

# set up the learners
bayes = Orange.classification.bayes.NaiveLearner()
tree = Orange.classification.tree.TreeLearner(mForPruning=2)
bayes.name = "bayes"
tree.name = "tree"
learners = [bayes, tree]

# compute accuracies on data
data = Orange.data.Table("voting")
res = Orange.evaluation.testing.cross_validation(learners, data, folds=10)
cm = Orange.evaluation.scoring.compute_confusion_matrices(res,
        classIndex=data.domain.classVar.values.index('democrat'))

Read more about the refactoring tool on the wiki and on the help page (python orange2to25.py --help).

Orange at ISMB/ECCB 2011

We presented the Orange Bioinformatics add-on at the ISMB/ECCB conference in Vienna, a joined event covering both 19th Annual International Conference on Intelligent Systems for Molecular Biology and 10th European Conference on Computational Biology.

We were giving out Orange stickers (with the URL) to the poster’s visitors. There was some interest; in the end we gave out about 10 of them, mostly to biologists, who were excited to perform some of the analysis themselves. Among the visitors was also a developer of a similar tool who seemed slightly surprised that something like this already exists, while another was disappointed because Orange only runs locally.

See the poster in action on the photo taken by Gregor Rot.

Orange poster (PNG)Poster in action

Orange 2.5 progress

We decided that we should reorganize Orange to provide more intuitive interface to the scripting interface. The next release, Orange 2.5 is getting better every day. But fear not, dear reader, we are working hard to ensure that your scripts will still work.

In the last morning of the camp in Bohinj we decided to use undercase_separated names instead of CamelCase. We have been steadily converting them and the deprecation utilities by Aleš help a lot. We just list the name changes for class attributes or arguments and their renaming is handled gracefully; they also remain accessible with the old names. Therefore, the code does not need to be duplicated to ensure backwards compatibility.

A simple example from the documentation of bagging and boosting. The old version first:

import orange, orngEnsemble, orngTree
import orngTest, orngStat

tree = orngTree.TreeLearner(mForPruning=2, name="tree")
bs = orngEnsemble.BoostedLearner(tree, name="boosted tree")
bg = orngEnsemble.BaggedLearner(tree, name="bagged tree")

data = orange.ExampleTable("lymphography.tab")

learners = [tree, bs, bg]
results = orngTest.crossValidation(learners, data, folds=3)
print "Classification Accuracy:"
for i in range(len(learners)):
    print ("%15s: %5.3f") % (learners[i].name, orngStat.CA(results)[i])

Orange 2.5 version:

import Orange

tree = Orange.classification.tree.TreeLearner(m_pruning=2, name="tree")
bs = Orange.ensemble.boosting.BoostedLearner(tree, name="boosted tree")
bg = Orange.ensemble.bagging.BaggedLearner(tree, name="bagged tree")

table = Orange.data.Table("lymphography.tab")

learners = [tree, bs, bg]
results = Orange.evaluation.testing.cross_validation(learners, table, folds=3)
print "Classification Accuracy:"
for i in range(len(learners)):
    print ("%15s: %5.3f") % (learners[i].name, Orange.evaluation.scoring.CA(results)[i])

In new Orange we only need to import a single module, Orange, the root of the new hierarchy.

Data loading speedups

Orange has been loading data faster since the end of February, especially if there are many attributes in the file.

Quick comparisons between the old new versions, measured on my computer:

  • adult.tab (32561 examples, 15 attributes): old version = 1.41s, new version = 0.86s.
  • DLBCL.tab (77 examples, 7071 attributes): old version = 2.72s, new version = 0.93s.
  • GDS1962.tab (104 examples, 31837 attributes): old version = 33.5s, new version = 6.6s.

The speedups were obtained with:

  • reuse of a buffer for parsing,
  • skipping type detection for attributes with known types, and
  • by keeping attributes in a different data structure internally.