Model-Based Feature Scoring

Feature scoring and ranking can help in understanding the data in supervised settings. Orange includes a number of standard feature scoring procedures one can access in the Rank widget. Moreover, a number of modeling techniques, like linear or logistic regression, can rank features explicitly through assignment of weights. Trained models like random forests have their own methods for feature scoring. Models inferred by these modeling techniques depend on their parameters, like type and level of regularization for logistic regression. Same holds for feature weight: any change of parameters of the modeling techniques would change the resulting feature scores.

It would thus be great if we could observe these changes and compare feature ranking provided by various machine learning methods. For this purpose, the Rank widget recently got a new input channel called scorer. We can attach any learner that can provide feature scores to the input of Rank, and then observe the ranking in the Rank table.


Say, for the famous voting data set (File widget, Browse documentation data sets), the last two feature score columns were obtained by random forest and logistic regression with L1 regularization (C=0.1). Try changing the regularization parameter and type to see changes in feature scores.


Feature weights for logistic and linear regression correspond to the absolute value of coefficients of their linear models. To observe their untransformed values in the table, these widgets now also output a data table with feature weights. (At the time of the writing of this blog, this feature has been implemented for linear regression; other classifiers and regressors that can estimate feature weights will be updated soon).


Data Mining Course in Houston

We have just completed an Introduction to Data Mining, a graduate course at Baylor College of Medicine in Texas, Houston. The course was given in September and consisted of seven two-hour lectures, each one followed with a homework assignment. The course was attended by about 40 students and some faculty and research staff.


This was a challenging course. The audience was new to data mining, and we decided to teach them with the newest, third version of Orange. We also experimented with two course instructors (Blaz and Janez), who, instead of splitting the course into two parts, taught simultaneously, one on the board and the other one helping the students with hands-on exercises. To check whether this worked fine, we ran a student survey at the end of the course. We used Google Sheets and then examined the results with students in the class. Using Orange, of course.


And the outcome? Looks like the students really enjoyed the course


and the teaching style.


The course took advantage of several new widgets in Orange 3, including those for data preprocessing and polynomial regression. The core development team put a lot of effort during the summer to debug and polish this newest version of Orange. Also thanks to the financial support by AXLE EU FP7 and CARE-MI EU FP7 grants and grants by the Slovene Research agency, we were able to finish everything in time.

Orange in Pavia, Italy

These days, we (Blaz Zupan and Marinka Zitnik, with full background support of entire Bioinformatics Lab) are running a three-day course on Data Mining in Python. Riccardo Bellazzi, a professor at University of Pavia, a world-renown researcher in biomedical informatics, and most of all, a great friend, has invited us to run the elective course for Pavia’s grad students. The enrollment was, he says, overwhelming, as with over 50 students this is by far the best attended grad course at Pavia’s faculty of engineering in the past years.

We have opted for the hands-on course and a running it as a workshop. The lectures include a new, development version of Orange 3, and mix it with numpy, scikit-learn, matplotlib, networkx and bunch of other libraries. Course themes are classification, clustering, data projection and network analysis.




Towards Orange 3

We are rushing, full speed ahead, towards Orange 3. A complete revamp of Orange in Python 3 changes its data model to that of numpy, making Orange compatible with an array of Python-based data analytics. We are rewriting all the widgets for visual programming as well. We have two open fronts: the scripting part, and the widget part. So much to do, but it is going well: the closed tasks for widgets are those on the left of Anze (the board full of sticky notes), and those open, in minority, are on Anze’s right. Oh, by the way, it’s Anze who is managing the work and he looks quite happy.


Loading your data

By a popular demand, we have just published a tutorial on how to load the data table into Orange. Besides its own .tab format, Orange can load any tab or comma delimited data set. The details are though in writing header rows that tell Orange about the type and domain of each attribute. The tutorial is a step-by-step description on how to do this and how to transfer the data from popular spreadsheet programs like Excel.

Hands-on Orange at Functional Genomics Workshop

Last week we have co-organized a Functional Genomics Workshop. At University of Ljubljana we have hosted an inspiring pack of scientists from the Donnelly Centre for Cellular and Biomolecular Research from Toronto. Part of the event was a hands-on workshop Data mining without programing, where we have used Orange to analyze data from systems biology. Data included a subset of Charlie Boone’s famous yeast interaction data and data from chemical genomics. For the program, info about the speakers, and panckages and šmorn check out workshop’s newspaper.

It is always a pleasure seeing a packed lecture room with all laptops running Orange. Attendees were assisted by members of the Biolab in Ljubljana. Hands-on program followed a set of short lectures we have crafted for intended audience – biologists. Everything ran smoothly. At the end, we got excited enough to promise a data import wizard for all those that have problems annotating the data with feature type tags. The deadline: two weeks from the end of the workshop.


Brief History of Orange, Praise to Donald Michie

Informatica has recently published our paper on the history of Orange. The paper is a post-publication from a Conference on 100 Years of Alan Turing and 20 Years of Slovene AI Society, where Janez Demšar gave a talk on the topics.

History of Orange goes all the way back to 1997, when late Donald Michie had an idea that machine learning needs an open toolbox for machine learning. To spark the development, we co-organized WebLab97 at beautiful Bled, Slovenia. Workshop’s name reflected Michie’s idea that tool should be a web application where people can submit data mining code, procedures, testing scripts, and data and share them in the joint web workspace.

Donald Michie, a pioneer of Artificial Intelligence, was always ahead of time. (Check out a great talk by Ivan Bratko on their friendship and adventures in chess and machine learning). At WebLab97, Michie was actually very, very ahead of time. But despite the presence of IBM’s Java team that could guide us in developments of the toolbox, the technology was not ripe and initiative of WebLab was gone as the conference ended. But, at least for us, the idea sparked interest of Janez and myself, and development of what is now Orange begun shortly after.

Our paper gives brief account of Orange’s history and its developments since WebLab97. For reasons of brevity it does not mention that prior to Qt we have experimented with other GUI platforms. Prior to Qt, we laid our hopes to Pwm Python megawidgets, a library that helped us to construct the first Orange graphical user interface. The GUI part of Orange was called Orange*First. Its screenshot shows a tab for interactive discretisation, thanks to Noriaki Aoki who then proposed that this kind of visualisation should be useful in medical data analysis:

orange first

PS Somehow, I have lost a latex file with a WebLab97 program. It should be on some backup tape, somewhere. The following scan of the first page (and a weblab97.pdf), left in some PPT presentation, is all that I can retrieve. The program of the second day is missing, with keynotes from Tom Mitchell, and much talk about then already a success story of R.


JMLR Publishes Article on Orange

Journal of Machine Learning Research has just published our paper on Orange. In the paper we focus on its Python scripting part. We have last reported on Orange scripting at ECML/PKDD 2004. The manuscript was well received (over 270 citations on Google Scholar), but it is now entirely outdated. This was also our only formal publication on Orange scripting. With publication in JMLR this is now a current description of Orange and will be, for a while :-), Orange’s primary reference.

Here’s a reference:

Demšar, J., Curk, T., & Erjavec, A. et al. Orange: Data Mining Toolbox in Python; Journal of Machine Learning Research 14(Aug):2349−2353, 2013.

and bibtex entry:

  author  = {Janez Dem\v{s}ar and Toma\v{z} Curk and Ale\v{s} Erjavec and
             \v{C}rt Gorup and Toma\v{z} Ho\v{c}evar and Mitar Milutinovi\v{c} and
             Martin Mo\v{z}ina and Matija Polajnar andMarko Toplak and  
             An\v{z}e Stari\v{c} and Miha \v{S}tajdohar and Lan Umek and 
             Lan \v{Z}agar and Jure \v{Z}bontar and Marinka \v{Z}itnik and
             Bla\v{z} Zupan},
  title   = {Orange: Data Mining Toolbox in Python},
  journal = {Journal of Machine Learning Research},
  year    = {2013},
  volume  = {14},
  pages   = {2349-2353},
  url     = {}

Orange 2.7

Orange 2.7 is out with a major update in the visual programming environment. Redesigned interface, new widgets, welcome screen with workflow browser. Text annotation and arrow lines in workspace. Preloaded workflows with annotations. Widget menu and search can now be activated through key press (open the Settings to make this option available). Extended or minimised widget tab. Improved widget browsing. Enjoy!




New scripting tutorial

Orange just got a new, completely rewritten scripting tutorial. The tutorial uses Orange class hierarchy as introduced for version 2.5. The tutorial is supposed to be a gentle introduction in Orange scripting. It includes many examples, from really simple ones to those more complex. To give you a hint about the later, here is the code for learner with feature subset selection from:

class SmallLearner(Orange.classification.PyLearner):
    def __init__(self, base_learner=Orange.classification.bayes.NaiveLearner,
                 name='small', m=5): = name
        self.m   = m
        self.base_learner = base_learner

    def __call__(self, data, weight=None):
        gain = Orange.feature.scoring.InfoGain()
        m = min(self.m, len(data.domain.features))
        best = [f for _, f in sorted((gain(x, data), x) \
          for x in data.domain.features)[-m:]]
        domain = + [data.domain.class_var])

        model = self.base_learner(, data), weight)
        return Orange.classification.PyClassifier(classifier=model,

The tutorial was first written for Python 2.3. Since, Python and Orange have changed a lot. And so did I. Most of the for loops have become one-liners, list and dictionary comprehension have become a must, and many new and great libraries have emerged. The (boring) tutorial code that used to read

c = [0] * len(data.domain.classVar.values)
for e in data:
    c[int(e.getclass())] += 1
print "Instances: ", len(data), "total",
r = [0.] * len(c)
for i in range(len(c)):
    r[i] = c[i] * 100. / len(data)
for i in range(len(data.domain.classVar.values)):
    print ", %d(%4.1f%s) with class %s" % 
        (c[i], r[i], '%', data.domain.classVar.values[i]),

is now replaced with

print Counter(str(d.get_class()) for d in data)

Ok. Pretty print is missing, but that, if not in the same line, could be done in another one.

For now, the tutorial focuses on data input and output, classification and regression. We plan to use other sections, but you can also give us a hint if there are any you would wish to be included.