We have covered some basic data visualization, modeling (classification) and model scoring, hierarchical clustering and data projection, and finished with a touch of deep-learning by diving into image analysis by deep learning-based embedding.
It’s not easy to pack so many new things on data analytics within four hours, but working with Orange helps. This was a hands-on workshop. Students brought their own laptops with Orange and several of its add-ons for bioinformatics and image analytics. We also showed how to prepare one’s own data using Google Forms and designed a questionary, augment it in a class, run it with students and then analyze the questionary with Orange.
The hard part of any short course that includes machine learning is how to explain overfitting. The concept is not trivial for data science newcomers, but it is so important it simply cannot be left out. Luckily, Orange has some cool widgets to help us understanding the overfitting. Below is a workflow we have used. We read some data (this time it was a yeast gene expression data set called brown-selected that comes with Orange), “destroyed the data” by randomly permuting the column with class values, trained a classification tree, and observed near perfect results when the model was checked on the training data.
Sure this works, you are probably saying. The models should have been scored on a separate test set! Exactly, and this is what we have done next with Data Sampler, which lead us to cross-validation and Test & Score widget.
This was a great and interesting short course and we were happy to contribute to the success of the student-run MLSC-2016 conference.
As you might know, Orange comes with several basic widget sets pre-installed. These allow you to upload and explore the data, visualize them, learn from them and make predictions. However, there are also some exciting add-ons available for installation. One of these is a bioinformatics add-on, which is our specialty.
Bioinformatics widget set allows you to pursue complex analysis of gene expression by providing access to several external libraries. There are four widgets intended specifically for this – dictyExpress, GEO Data Sets, PIPAx and GenExpress. GEO Data Sets are sourced from NCBI, PIPAx and dictyExpress from two Biolab projects, and finally GenExpress from Genialis. A lot of the data is freely accessible, while you will need a user account for the rest.
Once you open the widget, select the experiments you wish to use for your analysis and view it in the Data Table widget. You can compare these experiments in Data Profiles, visualize them in Volcano Plot, select the most relevant genes in Differential Expression widget and much more.
These databases enable you to start your research just by installing the bioinformatics add-on (Orange → Options → Add-ons…). The great thing is you can easily combine bioinformatics widgets with the basic pre-installed ones. What an easy way to immerse yourself in the exciting world of bioinformatics!
We design the tutorial for data mining researchers and molecular biologists with interest in large-scale data integration. In the tutorial we focus on collective latent factor models, a popular class of approaches for data fusion. We demonstrate the effectiveness of these approaches on several hands-on case studies from recommendation systems and molecular biology.
This is a high-risk event. I mean, for us, lecturers. Ok, no bricks will probably fall down. But, in the part of the tutorial, this is the first time we are showing Orange’s data fusion add-on. And not just showing: part of the tutorial is a hands-on session.
We would like to acknowledge Biolab members for pushing the widgets through the development pipeline under extreme time constraints. Special thanks to Anze, Ales, Jernej, Andrej, Marko, Aleksandar and all other members of the lab.
Last week we have co-organized a Functional Genomics Workshop. At University of Ljubljana we have hosted an inspiring pack of scientists from the Donnelly Centre for Cellular and Biomolecular Research from Toronto. Part of the event was a hands-on workshop Data mining without programing, where we have used Orange to analyze data from systems biology. Data included a subset of Charlie Boone’s famous yeast interaction data and data from chemical genomics. For the program, info about the speakers, and panckages and šmorn check out workshop’s newspaper.
It is always a pleasure seeing a packed lecture room with all laptops running Orange. Attendees were assisted by members of the Biolab in Ljubljana. Hands-on program followed a set of short lectures we have crafted for intended audience – biologists. Everything ran smoothly. At the end, we got excited enough to promise a data import wizard for all those that have problems annotating the data with feature type tags. The deadline: two weeks from the end of the workshop.
Actually, there was a lot of programming, but no Python or alike. The workshop was designed for biomedical students and Baylor’s faculty members. We have presented a visual programming approach for development of data mining workflows for interactive data exploration. A three-hour workshop consisted of 15 data mining lessons on visual data exploration, classification, clustering, network analysis, and gene expression analytics. Each lesson focused on a particular data analysis task that the attendees solved with Orange.
We presented the Orange Bioinformatics add-on at the ISMB/ECCB conference in Vienna, a joined event covering both 19th Annual International Conference on Intelligent Systems for Molecular Biology and 10th European Conference on Computational Biology.
We were giving out Orange stickers (with the URL) to the poster’s visitors. There was some interest; in the end we gave out about 10 of them, mostly to biologists, who were excited to perform some of the analysis themselves. Among the visitors was also a developer of a similar tool who seemed slightly surprised that something like this already exists, while another was disappointed because Orange only runs locally.
See the poster in action on the photo taken by Gregor Rot.