For When You Want to Transpose a Data Table…

Sometimes, you need something more. Something different. Something, that helps you look at the world from a different perspective. Sometimes, you simply need to transpose your data.

Since version 3.3.9, Orange has a Transpose widget that flips your data table around. Columns become rows and rows become columns. This is often useful, if you have, say, biological data.

Related: Datasets in Orange Bioinformatics

Today we will play around with brown-selected.tab, a data set on gene expression levels for 79 experiments. Our data table has genes in rows and experiments in columns, with gene expression levels recorded as values.

This representation focuses on exploring genes and allows them to be plotted or to construct a model to predict their functions. But what if I want to explore the experimental conditions and see how different external conditions influence the yeast cells? For this, it would be better to have experiments in rows and genes in columns. We can do this with Transpose.

Transpose widget took our gene meta attribute and used it for the new column names (YGR270W, YIL075C, etc.). It also appended class values to columns (Proteas). Former columns names (alpha 0, alpha 7, etc.) became our new meta attribute called Feature name.

Ok, we have a transposed data table. Now we ask ourselves: “Do similar experiment types (e.g. heat, cold, alpha, …) behave similarly?”

Let’s use PCA to transform these many-dimensional experiment vectors into a 2-D representation. We are going to use Scatter Plot to observe experiments (not genes) in a 2-D space. We expect the same experiment types to lie closer together than other experiments. A scatter plot after a 2-D transformation looks like this:

Spo5 11 lies quite far from the rest, so it could be an experiment to look out for. If we zoom in on the big cluster, we see that similar experiments indeed lie closer together.

Now, if you are reproducing the result, you probably won’t see these nice colors for class.

This is because we used the Create Class widget to help us create new class values. Create Class already available in Orange3-Prototypes add-on and will be included in a future Orange release. You can learn more about it soon… 🙂

 

Creating a new data table in Orange through Python

IMPORT DATA

 

One of the first tasks in Orange data analysis is of course loading your data. If you are using Orange through Python, this is as easy as riding a bike:

import Orange
data = Orange.data.Table(“iris”)
print (data)

This will return a neat data table of the famous Iris data set in the console.

 

CREATE YOUR OWN DATA TABLE

 

What if you want to create your own data table from scratch? Even this is surprisingly simple. First, import the Orange data library.

from Orange.data import *

 

Set all the attributes you wish to see in your data table. For discrete attributes call DiscreteVariable and set the name and the possible values, while for a continuous variable call ContinuousVariable and set only the attribute name.

color = DiscreteVariable(“color”, values=[“orange”, “green”, “yellow”])

calories = ContinuousVariable(“calories”)

fiber = ContinuousVariable(“fiber”)]

fruit = DiscreteVariable("fruit”, values=[”orange", “apple”, “peach”])

 

Then set the domain for your data table. See how we set class variable with class_vars?

domain = Domain([color, calories, fiber], class_vars=fruit)

 

Time to input your data!

data = Table(domain, [

[“green”, 4, 1.2, “apple”],

["orange", 5, 1.1, "orange"],

["yellow", 4, 1.0, "peach"]])

 

And now print what you have created!

print(data)

 

One final step:

Table.save(table, "fruit.tab")

 

Your data is safely stored to your computer (in the Python folder)! Good job!