Faster classification and regression trees

SimpleTreeLearner is an implementation of classification and regression trees that sacrifices flexibility for speed. A benchmark on 42 different datasets reveals that SimpleTreeLearner is 11 times faster than the original TreeLearner.

The motivation behind developing a new tree induction algorithm from scratch was to speed up the construction of random forests, but you can also use it as a standalone learner. SimpleTreeLearner uses gain ratio for classification and MSE for regression and can handle unknown values.

Comparison with TreeLearner

The graph below shows SimpleTreeLearner construction times on datasets bundled with Orange normalized to TreeLearner. Smaller is better.

SimpleTreeLearner speed

The harmonic mean (average speedup) on all the benchmarks is 11.4.


The user can set four parameters:

Maximal proportion of majority class.
Minimal number of examples in leaves.
Maximal depth of tree.
At every split an attribute will be skipped with probability skipProb. This parameter is especially useful for building random forests.

The code snippet below demonstrates the basic usage of SimpleTreeLearner. It behaves much like any other Orange learner would.

import Orange

data ="iris")

# build classifier and classify train data
classifier = Orange.classification.tree.SimpleTreeLearner(data, maxMajority=0.8)
for ex in data:
    print classifier(ex)

# estimate classification accuracy with cross-validation
learner = Orange.classification.tree.SimpleTreeLearner(minExamples=2)
result = Orange.evaluation.testing.cross_validation([learner], data)
print 'CA:', Orange.evaluation.scoring.CA(result)[0]

Orange at ISMB/ECCB 2011

We presented the Orange Bioinformatics add-on at the ISMB/ECCB conference in Vienna, a joined event covering both 19th Annual International Conference on Intelligent Systems for Molecular Biology and 10th European Conference on Computational Biology.

We were giving out Orange stickers (with the URL) to the poster’s visitors. There was some interest; in the end we gave out about 10 of them, mostly to biologists, who were excited to perform some of the analysis themselves. Among the visitors was also a developer of a similar tool who seemed slightly surprised that something like this already exists, while another was disappointed because Orange only runs locally.

See the poster in action on the photo taken by Gregor Rot.

Orange poster (PNG)Poster in action