One more exciting visualization has been introduced to Orange – a Nomogram. In general, nomograms are graphical devices that can approximate the calculation of some function. A Nomogram widget in Orange visualizes Logistic Regression and Naive Bayes classification models, and compute the class probabilities given a set of attributes values. In the nomogram, we can check how changing of the attribute values affect the class probabilities, and since the widget (like widgets in Orange) is interactive, we can do this on the fly.
So, how does it work? First, feed the Nomogram a classification model, say, Logistic Regression. We will use the Titanic survival data that comes with Orange for this example (in File widget, choose “Browse documentation datasets”).
In the nomogram, we see the top ranked attributes and how much they contribute to the target class. Seems like a male third class adult had a much lower survival rate than did female first class child.
The most important attribute, however, seems to be ‘sex’, where the chance for survival (target class = no) is lower for males than it is for females. How do I know? Grab the blue dot over the attribute and drag it from ‘male’ to ‘female’. The total probability for dying on Titanic (survived=no) drops from 89% to 43%.
The same goes for all the other attributes – you can interactively explore how much a certain value contributes to the probability of a selected target class.
But it gets even better! Instead of dragging the blue dots in the nomogram, you can feed it the data. In the workflow below, we pass the data through the Data Table widget and then feed the selected data instance to the Nomogram. The Nomogram would then show what is the probability of the target class for this particular instance, and it would “explain” what are the magnitudes of contributions of individual attribute values.
This makes Nomogram a great widget for understanding the model and for interactive data exploration.
Did you recently wonder where did Classification Tree go? Or what happened to Majority?
Orange 3.4.0 introduced a new widget category, Model, which now contains all supervised learning algorithms in one place and replaces the separate Classify and Regression categories.
This, however, was not a mere cosmetic change to the widget hierarchy. We wanted to simplify the interface for new users and make finding an appropriate learning algorithm easier. Moreover, now you can reuse some workflows on different data sets, say housing.tab and iris.tab!
Leading up to this change, many algorithms were refactored so that regression and classification versions of the same method were merged into a single widget (and class in the underlying python API). For example, Classification Tree and Regression Tree have become simply Tree, which is capable of modelling categorical or numeric target variables. And similarly for SVM, kNN, Random Forest, …
Have you ever searched for a widget by typing its name and were confused by multiple options appearing in the search box? Now you do not need to decide if you need Classification SVM or Regression SVM, you can just select SVM and enjoy the rest of the time doing actual data analysis!
Here is a quick wrap-up:
Majority and Mean became Constant.
Classification Tree and Regression Tree became Tree. In the same manner, Random Forest and Regression Forest became Random Forest.
SVM, SGD, AdaBoost and kNN now work for both classification and regression tasks.
Linear Regression only works for regression.
Logistic Regression, Naive Bayes and CN2 Rule Induction only work for classification.
Sorry about the last part, we really couldn’t do anything about the very nature of these algorithms! 🙂