Model replaces Classify and Regression

Did you recently wonder where did Classification Tree go? Or what happened to Majority?

Orange 3.4.0 introduced a new widget category, Model, which now contains all supervised learning algorithms in one place and replaces the separate Classify and Regression categories.

    

This, however, was not a mere cosmetic change to the widget hierarchy. We wanted to simplify the interface for new users and make finding an appropriate learning algorithm easier. Moreover, now you can reuse some workflows on different data sets, say housing.tab and iris.tab!

Leading up to this change, many algorithms were refactored so that regression and classification versions of the same method were merged into a single widget (and class in the underlying python API). For example, Classification Tree and Regression Tree have become simply Tree, which is capable of modelling categorical or numeric target variables. And similarly for SVM, kNN, Random Forest, …

Have you ever searched for a widget by typing its name and were confused by multiple options appearing in the search box? Now you do not need to decide if you need Classification SVM or Regression SVM, you can just select SVM and enjoy the rest of the time doing actual data analysis!

 

Here is a quick wrap-up:

  • Majority and Mean became Constant.
  • Classification Tree and Regression Tree became Tree. In the same manner, Random Forest and Regression Forest became Random Forest.
  • SVM, SGD, AdaBoost and kNN now work for both classification and regression tasks.
  • Linear Regression only works for regression.
  • Logistic Regression, Naive Bayes and CN2 Rule Induction only work for classification.

Sorry about the last part, we really couldn’t do anything about the very nature of these algorithms! 🙂

 
      

Data Fusion Add-on for Orange

Orange is about to get even more exciting! We have created a prototype add-on for data fusion, which will certainly be of interest to many users. Data fusion brings large heterogeneous data sets together to create sensible clusters of related data instances and provides a platform for predictive modelling and recommendation systems.

This widget set can be used either to recommend you the next movie to watch based on your demographic characteristics, movies you gave high scores to, your preferred genre, etc. or to suggest you a set of genes that might be relevant for a particular biological function or process. We envision the add-on to be useful for predictive modeling dealing with large heterogeneous data compendia, such as life sciences.

The prototype set will be available for download next week, but we are happy to give you a sneak peek below.

Data fusion workflow
Data fusion workflow

 

  1. Movie Ratings widget is pre-set to offer data on movie ratings by users with 706 users and 855 movies (10% of the data selected as a subset).
  2. We add IMDb Actors to filter the data by matching movie ratings with actors.
  3. Then we add the Fusion Graph widget to fuse the data together. Here we have two object types, i.e. users and movies, and one relation between them, i.e. movie ratings.
  4. In Latent Factors we see latent data representation demonstrated by red squares at the side. Let’s select a latent matrix associated with Users as our input for the Data Table.
  5. In Data Table we see the latent data matrix of Users. The algorithm infers low-dimensional user profiles by collective consideration of entire data collection, i.e. movie ratings and actor information. In our scenario the algorithm has  transformed 855 movie titles into 70 movie groupings, i.e. latent components.
Data fusion visualized
Data fusion visualized