Making Predictions

One of the cool things about being a data scientist is being able to predict. That is, predict before we know the actual outcome. I am not talking about verifying your favorite classification algorithm here, and I am not talking about cross-validation or classification accuracies or AUC or anything like that. I am talking about the good old prediction. This is where our very own Predictions widget comes to help.

predictive analytics
Predictions workflow.

 

We will be exploring the Iris data set again, but we’re going to add a little twist to it. Since we’ve worked so much with it already, I’m sure you know all about this data. But now we got three new flowers in the office and of course there’s no label attached to tell us what species of Iris these flowers are. [sigh….] Obviously, we will be measuring petals and sepals and contrasting the results with our data.

predictive analytics
Our new data on three flowers. We have used Google Sheets to enter the data and the copied the sharable link and pasted the link to the File widget.

 

But surely you don’t want to go through all 150 flowers to properly match the three new Irises? So instead, let’s first train a model on the existing data set. We connect the File widget to the chosen classifier (we went with Classification Tree this time) and feed the results into Predictions. Now we write down the measurements for our new flowers into Google Sheets (just like above), load it into Orange with a new File widget and input the fresh data into Predictions. We can observe the predicted class directly in the widget itself.

predictive analytics
Predictions made by classification tree.

 

In the left part of the visualization we have the input data set (our measurements) and in the right part the predictions made with classification tree. By default you see probabilities for all three class values and the predicted class. You can of course use other classifiers as well – it would probably make sense to first evaluate classifiers on the existing data set, find the best one for your and then use it on the new data.

 

  • bodeoni

    https://uploads.disquscdn.com/images/d3731874c67c8af91928f611acb802d9e35390744024d4b0dcf6c106d41aa496.png.

    Thanks for the post. I am trying to make predictions on new data but I cant seem to get the output from the model into the prediction widget. See attached picture. Please advise. Thanks

    • Ajda Pretnar

      Predictions needs Data input as well. Please see widget documentation for details.

    • Yaseen Afzal

      hi .. i want some help in orange ? i want to tarin a module on my mobiles reviews data set using svm but it will not done correctly ..

    • Yaseen Afzal

      i you will give me your email contact then i will contact you plzz?

    • Breck

      I am having the same problem. Did you ever figure this out?

  • Meghan Brown

    Thanks for your post. I am currently using the Test and Score at the same time as the Prediction widget. Will Predictions work without the Test and Score, or do i need to run both at the same time? Thx

    • Ajda Pretnar

      Test&Score and Predictions are two different widgets. Test&Score is meant for evaluating the performance of the model with cross-validation or other method of your choice. Predictions takes a model and predicts new data instances. Please see widget documentation for details on their use.

      • Meghan Brown

        I thought that was the case but wasn’t 100% sure. Thank you for confirming 🙂

  • joy biswas

    I am facing an issue, when I attach the test file to the Predictions widget it says “One or more predictors failed…” All my columns in the test file are same as my training data file. Also I have set the target variable in the File widget. Can you please help?

    • Alex

      I have the same problem … is there a bug in Orange 3.4???

      • Lan Žagar

        There was indeed a bug in the Predictions widget in Orange 3.4.
        It is already fixed in version 3.4.2. Please see if everything works for you as well and report if there are still any issues.

        • Jessica

          I have Orange version 3.4.3, and I am having the same issue. The columns in the test file are the same as in the training data file, and the targets are the same in both as well.

      • joy biswas

        The problem got solved when I installed Orange 3.4.2, believe the latest version fixes that issue.