Orange Workshops: Luxembourg, Pavia, Ljubljana

February was a month of Orange workshops.

Ljubljana: Biologists

We (Tomaž, Martin and I) have started in Ljubljana with a hands-on course for the COST Action FA1405 Systems Biology Training School. This was a four hour workshop with an introduction to classification and clustering, and then with application of machine learning to analysis of gene expression data on a plant called Arabidopsis. The organization of this course has even inspired us for a creation of a new widget GOMapMan Ontology that was added to Bioinformatics add-on. We have also experimented with workflows that combine gene expressions and images of mutant. The idea was to find genes with similar expression profile, and then show images of the plants for which these genes have stood out.

Luxembourg: Statisticians

This workshop took place at STATEC, Luxembourgh’s National Institute of Statistics and Economic Studies. We (Anže and I) got invited by Nico Weydert, STATEC’s deputy director, and gave a two day lecture on machine learning and data mining to a room full of experienced statisticians. While the purpose was to showcase Orange as a tool for machine learning, we have learned a lot from participants of the course: the focus of machine learning is still different from that of classical statistics.

Statisticians at STATEC, like all other statisticians, I guess, value, above all, understanding of the data, where accuracy of the models does not count if it cannot be explained. Machine learning often sacrifices understanding for accuracy. With focus on data and model visualization, Orange positions itself somewhere in between, but after our Luxembourg visit we are already planning on new widgets for explanation of predictions.

Pavia: Engineers

About fifty engineers of all kinds at University of Pavia. Few undergrads, then mostly graduate students, some postdocs and even quite a few of the faculty staff have joined this two day course. It was a bit lighter that the one in Luxembourg, but also covered essentials of machine learning: data management, visualization and classification with quite some emphasis on overfitting on the first day, and then clustering and data projection on the second day. We finished with a showcase on image embedding and analysis. I have in particular enjoyed this last part of the workshop, where attendees were asked to grab a set of images and use Orange to find if they can cluster or classify them correctly. They were all kinds of images that they have gathered, like flowers, racing cars, guitars, photos from nature, you name it, and it was great to find that deep learning networks can be such good embedders, as most students found that machine learning on their image sets works surprisingly well.

We thank Riccardo Bellazzi, an organizer of Pavia course, for inviting us. Oh, yeah, the pizza at Rossopommodoro was great as always, though Michella’s pasta al pesto e piselli back at Riccardo’s home was even better.

Intro to Data Mining for Life Scientists

RNA Club Munich has organized Molecular Life of Stem Cells Conference in Ljubljana this past Thursday, Friday and Saturday. They asked us to organize a four-hour workshop on data mining. And here we were: four of us, Ajda, Anze, Marko and myself (Blaz) run a workshop for 25 students with molecular biology and biochemistry background.

img_20160929_133840

We have covered some basic data visualization, modeling (classification) and model scoring, hierarchical clustering and data projection, and finished with a touch of deep-learning by diving into image analysis by deep learning-based embedding.

Related: Data Mining Course at Baylor College of Medicine in Houston

It’s not easy to pack so many new things on data analytics within four hours, but working with Orange helps. This was a hands-on workshop. Students brought their own laptops with Orange and several of its add-ons for bioinformatics and image analytics. We also showed how to prepare one’s own data using Google Forms and designed a questionary, augment it in a class, run it with students and then analyze the questionary with Orange.

pano_20160929_113352

img_0355

img_0353

The hard part of any short course that includes machine learning is how to explain overfitting. The concept is not trivial for data science newcomers, but it is so important it simply cannot be left out. Luckily, Orange has some cool widgets to help us understanding the overfitting. Below is a workflow we have used. We read some data (this time it was a yeast gene expression data set called brown-selected that comes with Orange), “destroyed the data” by randomly permuting the column with class values, trained a classification tree, and observed near perfect results when the model was checked on the training data.

yeast-overfitting-distributions

Sure this works, you are probably saying. The models should have been scored on a separate test set! Exactly, and this is what we have done next with Data Sampler, which lead us to cross-validation and Test & Score widget.

This was a great and interesting short course and we were happy to contribute to the success of the student-run MLSC-2016 conference.

Datasets in Orange Bioinformatics Add-On

As you might know, Orange comes with several basic widget sets pre-installed. These allow you to upload and explore the data, visualize them, learn from them and make predictions. However, there are also some exciting add-ons available for installation. One of these is a bioinformatics add-on, which is our specialty.

bioinformatics-blog

Bioinformatics widget set allows you to pursue complex analysis of gene expression by providing access to several external libraries. There are four widgets intended specifically for this – dictyExpress, GEO Data Sets, PIPAx and GenExpress. GEO Data Sets are sourced from NCBI, PIPAx and dictyExpress from two Biolab projects, and finally GenExpress from Genialis. A lot of the data is freely accessible, while you will need a user account for the rest.

Once you open the widget, select the experiments you wish to use for your analysis and view it in the Data Table widget. You can compare these experiments in Data Profiles, visualize them in Volcano Plot, select the most relevant genes in Differential Expression widget and much more.

 

BioinfoDatasets
Three widgets with experiment data libraries.

 

These databases enable you to start your research just by installing the bioinformatics add-on (Orange → Options → Add-ons…). The great thing is you can easily combine bioinformatics widgets with the basic pre-installed ones. What an easy way to immerse yourself in the exciting world of bioinformatics!

Data Fusion Tutorial at the [BC]^2

We are excited to host a three-hour tutorial on data fusion at the Basel Computational Biology Conference. To this end we have prepared a series of short lectures notes that accompany the recently developed Data Fusion Add-on for Orange.

scheme

We design the tutorial for data mining researchers and molecular biologists with interest in large-scale data integration. In the tutorial we focus on collective latent factor models, a popular class of approaches for data fusion. We demonstrate the effectiveness of these approaches on several hands-on case studies from recommendation systems and molecular biology.

This is a high-risk event. I mean, for us, lecturers. Ok, no bricks will probably fall down. But, in the part of the tutorial, this is the first time we are showing Orange’s data fusion add-on. And not just showing: part of the tutorial is a hands-on session.

We would like to acknowledge Biolab members for pushing the widgets through the development pipeline under extreme time constraints. Special thanks to Anze, Ales, Jernej, Andrej, Marko, Aleksandar and all other members of the lab.

This post was contributed by Marinka and Blaz.

Hands-on Orange at Functional Genomics Workshop

Last week we have co-organized a Functional Genomics Workshop. At University of Ljubljana we have hosted an inspiring pack of scientists from the Donnelly Centre for Cellular and Biomolecular Research from Toronto. Part of the event was a hands-on workshop Data mining without programing, where we have used Orange to analyze data from systems biology. Data included a subset of Charlie Boone’s famous yeast interaction data and data from chemical genomics. For the program, info about the speakers, and panckages and šmorn check out workshop’s newspaper.

It is always a pleasure seeing a packed lecture room with all laptops running Orange. Attendees were assisted by members of the Biolab in Ljubljana. Hands-on program followed a set of short lectures we have crafted for intended audience – biologists. Everything ran smoothly. At the end, we got excited enough to promise a data import wizard for all those that have problems annotating the data with feature type tags. The deadline: two weeks from the end of the workshop.

fg-orange

Workshops at Baylor College of Medicine

On May 22nd and May 23rd, we (Blaz Zupan and Janez Demsar, assisted by Marinka Zitnik and Balaji Santhanam) have given two hands-on workshops called Data Mining without Programming at Baylor College of Medicine in Houston, Texas.

Actually, there was a lot of programming, but no Python or alike. The workshop was designed for biomedical students and Baylor’s faculty members. We have presented a visual programming approach for development of data mining workflows for interactive data exploration. A three-hour workshop consisted of 15 data mining lessons on visual data exploration, classification, clustering, network analysis, and gene expression analytics. Each lesson focused on a particular data analysis task that the attendees solved with Orange.

The two workshops were organized by Baylor’s Computational and Integrative Biomedical Research Center. Over two days, the event was attended by a large audience of 120 attendees.

workshop-a.jpg

workshop-b.jpg

Orange at ISMB/ECCB 2011

We presented the Orange Bioinformatics add-on at the ISMB/ECCB conference in Vienna, a joined event covering both 19th Annual International Conference on Intelligent Systems for Molecular Biology and 10th European Conference on Computational Biology.

We were giving out Orange stickers (with the URL) to the poster’s visitors. There was some interest; in the end we gave out about 10 of them, mostly to biologists, who were excited to perform some of the analysis themselves. Among the visitors was also a developer of a similar tool who seemed slightly surprised that something like this already exists, while another was disappointed because Orange only runs locally.

See the poster in action on the photo taken by Gregor Rot.

Orange poster (PNG)Poster in action