Even though the summer is nigh, we are hardly going to catch a summer break this year. Orange team is busy holding workshops around the world to present the latest widgets and data mining tools to the public. Last week we had a very successful tutorial at [BC]2 in Basel, Switzerland, where Marinka and Blaž presented data fusion. A part of the tutorial was a hands-on workshop with Orange’s new add-on for data fusion. Marinka also got an award for the poster, where data fusion was used to hunt for Dictyostelium bacterial-response genes. This week, we are in Pavia, Italy, also for Matrix Computations in Biomedical Informatics Workshop at AIME 2015, a Conference on Artificial Intelligence in Medicine. During the workshop, we are giving an invited talk on learning latent factor models by data fusion and we’ll also show Orange’s data fusion add-on. Thanks to the workshop organizers, Riccardo Bellazzi, Jimeng Sun and Ping Zhang, the workshop program looks great.
We design the tutorial for data mining researchers and molecular biologists with interest in large-scale data integration. In the tutorial we focus on collective latent factor models, a popular class of approaches for data fusion. We demonstrate the effectiveness of these approaches on several hands-on case studies from recommendation systems and molecular biology.
This is a high-risk event. I mean, for us, lecturers. Ok, no bricks will probably fall down. But, in the part of the tutorial, this is the first time we are showing Orange’s data fusion add-on. And not just showing: part of the tutorial is a hands-on session.
We would like to acknowledge Biolab members for pushing the widgets through the development pipeline under extreme time constraints. Special thanks to Anze, Ales, Jernej, Andrej, Marko, Aleksandar and all other members of the lab.
Orange is about to get even more exciting! We have created a prototype add-on for data fusion, which will certainly be of interest to many users. Data fusion brings large heterogeneous data sets together to create sensible clusters of related data instances and provides a platform for predictive modelling and recommendation systems.
This widget set can be used either to recommend you the next movie to watch based on your demographic characteristics, movies you gave high scores to, your preferred genre, etc. or to suggest you a set of genes that might be relevant for a particular biological function or process. We envision the add-on to be useful for predictive modeling dealing with large heterogeneous data compendia, such as life sciences.
The prototype set will be available for download next week, but we are happy to give you a sneak peek below.
Movie Ratings widget is pre-set to offer data on movie ratings by users with 706 users and 855 movies (10% of the data selected as a subset).
We add IMDb Actors to filter the data by matching movie ratings with actors.
Then we add the Fusion Graph widget to fuse the data together. Here we have two object types, i.e. users and movies, and one relation between them, i.e. movie ratings.
In Latent Factors we see latent data representation demonstrated by red squares at the side. Let’s select a latent matrix associated with Users as our input for the Data Table.
In Data Table we see the latent data matrix of Users. The algorithm infers low-dimensional user profiles by collective consideration of entire data collection, i.e. movie ratings and actor information. In our scenario the algorithm has transformed 855 movie titles into 70 movie groupings, i.e. latent components.