NetworkX in Orange

NetworkX – a popular open-source python library for network analysis has finally found its way into Orange. It is now used as a base class for network representation in all Orange modules and widgets. By that, we offered to the widespread network community a fruitful and fun way to visualize and explore networks, using their existing NetworkX scripts. It has never been easier to combine network analysis and visualization with existing machine learning and data discovery methods.

Complete documentation is available in the Orange network headquarters. For a brief overview, take a look at the following example. Let us suppose we would like to analyse the data about patients, having one of two types of leukemia. So, we have a data set with 72 patient, 4600+ gene expressions and a class variable. We also have a vast network of human genes, connected if they share a biological function. What we would like to examine is a sub-network with only several hundred most expressed genes from the data set. To show off a bit, we will also use the Orange Bioinformatics add-on. Here is how we do it:

import Orange
import obiExpression

# load leukemia data set
table = Orange.data.Table("/media/Ox/Projects_Archive/res/BIO/leukemia/leukemiaGSEA.tab")

useAttributeLabels = False
ttest = obiExpression.ExpressionSignificance_TTest(table, useAttributeLabels)

target = [table.domain.classVar(0), table.domain.classVar(1)]

# test for significantly expressed genes
score = ttest(target = target)

# each gene is scored (t-test, p-value)
print score[0]
>>> (FloatVariable 'HIST1H4C', (1.8377179790830149, 0.07034778767062116))

# sort by p-value
from operator import itemgetter
score.sort(key=lambda s: s[1][1])

# select 200 genes with the lowest p-value
important_genes = [gene_var.name for gene_var, s in score[:200]]

# read the gene network (5000+ genes, dense network)
G = Orange.network.readwrite.read('genes_biofunct.gpickle')

items = G.items().filter_bool({'gene': important_genes})
indices = [i for i, present in enumerate(items) if present]

# build a subraph of 200 most expressed genes
G_sub = G.subgraph(indices)

In addition to the power of scripting environment, we also get the benefits of visual data exploration with Orange widgets. However, network widgets are currently under heavy development, so expect some bugs if you dare to try them. Coding should be finished in a month or two, check the blog for progress updates. Here is how to open the network in Nx Explorer widget:

import sys
import PyQt4

# must have OWNxExplorer in python path!
import OWNxExplorer

app=PyQt4.QtGui.QApplication(sys.argv)
ow=OWNxExplorer.OWNxExplorer()
ow.show()

# set the network
ow.set_graph(G_sub)
app.exec_()

Orange GSoC: Multi-label Classification Implementation

Multi-label classification is one of the three projects of Google Summer Code 2011 for Orange. The main goal is to extend the Orange to support multi-label, including dataset support, two basic multi-label classifications-problem-transformation methods & algorithm adaptation methods, evaluation measures, GUI support, documentation, testing, and so on.

My name is Wencan Luo, from China. I’m very happy to work with my mentor Matija. Until now, we have finished a framework for multi-label support for Orange.

To support multi-label data structure, a special value is added into their ‘attributes’ dictionary. In this way, we can know whether the attribute is a type of class without altering the old Example Table class.

Moreover, a transformation classification method to support multilabel is implemented, named Binary Relevance. All the codes are extended from the old Orange code using Python to be compatible with single-label classification methods.

In addition, the evaluator for multilalbel classification is also implemented based on the old single-label evaluator in Orange.evaluator.testing and Orange.evaluator.scoring modules.

At last, the widget for Binary Relevance method and Evaluator is implemented.

Many work has to be done as following:

  • one more transformation method
  • two adaptive methods
  • ranking-based evaluator
  • widgets to support the above methods
  • testing

Fink packages now also 64-bit

Fink packages (we are using for system-wide Orange installations on Mac OS X) were updated to 64-bit. So if you were using 64-bit Fink installation you will be now able also to use Orange (and our binary Fink repository of already compiled packages). Just use this this installation script to configure your local Fink installation to use our binary Fink repository and add information about Orange packages (they are not available among official Fink packages).