Python Script: Managing Data on the Fly

Python Script is this mysterious widget most people don’t know how to use, even those versed in Python. Python Script is the widget that supplements Orange functionalities with (almost) everything that Python can offer. And it’s time we unveil some of its functionalities with a simple example.

Example: Batch Transform the Data

There might be a time when you need to apply a function to all your attributes. Say you wish to log-transform their values, as it is common in gene expression data. In theory, you could do this with Feature Constructor, where you would log-transform every attribute individually. Sounds laborious? It’s because it is. Why else we have computers if not to reduce manual labor for certain tasks? Let’s do it the fast way – with Python Script.

First, open File widget and load geo-gds360.tab from Browse documentation data sets. This data set has 9485 features, so imagine having to transform each feature individually.

Instead, we will connect Python Script to File and use a simple script to apply the same transformation to all attributes.

import numpy as np
from Orange.data import Table

new_X = np.log(in_data.X)
out_data = Table(in_data.domain, new_X, in_data.Y, in_data.metas)

This is really simple. Use in_data.X, which accesses all features in the data set, to transform the data with np.log (or any other numpy function). Set out_data to new_X and, voila, the transformed data is on the output. In a few lines we have instantly handled all 9485 features.

You can inspect the data before and after transformation in a Data Table widget.

Original data.
Log-transformed data.

 

This is it. Now we can do our standard analysis on the transformed data. Even better! We can save our script and use it in Python Script widget any time we want.

For your convenience I have already added the Log Attributes Script, so you can download and use it instantly!

Have a more interesting example with Python Script? We’d love to hear about it!

  • Ray Schumacher

    This seems like a possible avenue to import esoteric data files. (?)
    I wrote a method in io.py to import “raw” binaries of N cols by arbitrary length, but I did it by using
    file = SpooledTemporaryFile(mode=’w+’, buffering=None,
    encoding=’us-ascii’, newline=None, suffix=None,
    prefix=None, dir=None)
    and passing the memory file to the usual CSV reader – a tad slow.
    I gave up on trying to create a FileFormat() method.

    • Ajda Pretnar

      You probably have to transform any file into Orange.data.Table format for Orange to be able to read it. I think there should be some methods already implemented that work with csv.

  • ferdo

    Thanks for the example!

    And I know it’s just four lines of code, but would you mind explaining them a bit? Or point us to the Table reference.

    • Ajda Pretnar

      The first two lines are imports.

      new_X = np.log(in_data.X) creates new features that were created from the old ones (in_data.X) with np.log function.
      out_data = Table(in_data.domain, new_X, in_data.Y, in_data.metas) sets the new features into a data table, taking the old domain, meta and class variables, but using the new features.

  • Yaseen Afzal

    how to get sentiment analysis data widget in python script in orange ..