Analyzing Surveys

Our streak of workshops continues. This time we taught professionals from public administration how they can leverage data analytics and machine learning to retrieve interesting information from surveys. Thanks to the Ministry of Public Administration, this is only the first in a line of workshops on data science we are preparing for public sector employees.

For this purpose, we have designed EnKlik Anketa widget, which you can find in Prototypes add-on. The widget reads data from a Slovenian online survey service OneClick Survey and imports the results directly into Orange.

We have prepared a test survey, which you can import by entering a public link to data into the widget. Here’s the link: https://www.1ka.si/podatki/141025/72F5B3CC/ . Copy it into the Public link URL line in the widget. Once you press Enter, the widget loads the data and displays retrieved features, just like the File widget.

EnKlik Anketa widget is similar to the File widget. It also enables changing the attribute type and role.

 

The survey is in Slovenian, but we can use Edit Domain to turn feature names into English equivalent.

We renamed attributes in order as they appear in the survey. If you load the survey yourself, you can rename them just like you see here.

 

As always, we can check the data in a Data Table. We have 41 respondents and 7 questions. Each respondent chose a nickname, which makes it easier to browse the data.

Now we can perform familiar clustering to uncover interesting groups in our data. Connect Distances to Edit Domain and Hierarchical Clustering to Distances.

Distance from Pipi and Chad to other respondents is very high, which makes them complete outliers.

 

We have two outliers, Pipi and Chad. One is an excessive sportsman (100 h of sport per week) and the other terminally ill (general health -1). Or perhaps they both simply didn’t fill out the survey correctly. If we use the Data Table to filter out Pipi and Chad, we get a fairly good clustering.

We can use Box Plot, to observe what makes each cluster special. Connect Box Plot to Hierarchical Clustering (with the two groups selected), select grouping by Cluster and tick Order by relevance.

Box Plot separates distributions by Cluster and orders attributes by how well they split selected subgroups.

 

The final workflow.

 

Seems like our second cluster (C2) is the sporty one. If we are serving in the public administration, perhaps we can design initiatives targeting cluster C1 to do more sports. It is so easy to analyze the data in Orange!

Mining our own data

Recently we’ve made a short survey that was, upon Orange download, asking people how they found out about Orange, what was their data mining level and where do they work. The main purpose of this is to get a better insight into our user base and to figure out what is the profile of people interested in trying Orange.

Here we have some preliminary results that we’ve managed to gather in the past three weeks or so. Obviously we will use Orange to help us make sense of the data.

 

We’ve downloaded our data from Typeform and appended some background information such as OS and browser. Let’s see what we’ve got in the Data Table widget.

blog-results7

 

Ok, this is our entire data table. Here we also have the data on people who completed the survey and who didn’t. First, let’s organize the data properly. We’ll do this with Select Columns widget.

blog-results

 

We removed all the meta attributes as they are not very relevant for our analysis. Next we moved the ‘completed’ attribute into target variable, thus making it our class variable.

blog-results2

 

Now we would like to see some basic distributions from our data.

blog-results3

 

Interesting. Most of our users are working on Windows, a few on Mac and very few on Linux.

Let’s investigate further. Now we want to know more about those people who actually completed the survey. Let’s use Select Columns again, this time removing os_type, os_name, agent_name and completed from our data and keeping just the answers. We made “Where do you work?” our class variable, but we could use either one of the three. Another trick is to set in directly in Distributions widget under ‘Group by’.

blog-results4

 

Ok, let’s again use Distributions – this is such a simple way to get a good sense of your data.

blog-results5

 

Obviously out of those who found out about Orange in college, most are students, but what’s interesting here is that there are so many. We can also see that out of those who found us on the web, most come from the private sector, followed by academia and researchers. Good. How about the other question?

blog-results6

 

Again, results are not particularly shocking, but it’s great to confirm your hypothesis with real data. Out of beginner level data miners, most are students, while most intermediate users come from the industry.

A quick look at the Mosaic Display will give us a good overview:

blog-results8

 

Yup, this sums it up quite nicely. We have lots of beginner levels users and not many expert ones (height of the box). Also most people found out about Orange on the web or in college (width of the box). A thin line on the left shows apriori distribution, thus making it easier to compare expected and actual number of instances. For example, there should be at least some people who are students and have found out about Orange at a conference. But there aren’t – a contrast between how much red there should be in the box (line on the left) and how much there actually is (bigger part of the box) is quite telling. We can even select all the beginner level users who found out about Orange in college and further inspect the data, but be it enough for now.

Our final workflow:

 

blog-results12

 

Obviously, this is a very simple analysis. But even such simple tasks are never boring with good visualization tools such as Distributions and Mosaic Display. You could also use Venn Diagram to find common features of selected subsets or perhaps Sieve Diagram for probabilities.

 

We are very happy to get these data and we would like to thank everyone who completed the survey. If you wish to help us further, please fill out a longer survey that won’t actually take you more than 3 minutes of your time (we timed it!).

 

Happy Friday everyone!