Being a political scientist, I did not even hear about data mining before I’ve joined Biolab. And naturally, as with all good things, data mining started to grow on me. Give me some data, connect a bunch of widgets and see the magic happen!
But hold on! There are still many social scientists out there who haven’t yet heard about the wonderful world of data mining, text mining and machine learning. So I’ve made it my mission to spread the word. And that was the spirit that led me back to my former university – School of Political Sciences, University of Bologna.
University of Bologna is the oldest university in the world and has one of the best departments for political sciences in Europe. I held a lecture Digital Research – Data Mining for Political Scientists for MIREES students, who are specializing in research and studies in Central and Eastern Europe.
The main goal of the lecture was to lay out the possibilities that contemporary technology offers to researchers and to showcase a few simple text mining tasks in Orange. We analysed Trump’s and Clinton’s Twitter timeline and discovered that their tweets are highly distinct from one another and that you can easily find significant words they’re using in their tweets. Moreover, we’ve discovered that Trump is much better at social media than Clinton, creating highly likable and shareable content and inventing his own hashtags. Could that be a tell-tale sign of his recent victory?
Perhaps. Our future, data-mining savvy political scientists will decide. Below, you can see some examples of the workflows presented at the workshop.
Recently we’ve been participating at Days of Computer Science, organized by the Museum of Post and Telecommunications and the Faculty of Computer and Information Science, University of Ljubljana, Slovenia. The project brought together pupils and students from around the country and hopefully showed them what computer science is mostly about. Most children would think programming is just typing lines of code. But it’s more than that. It’s a way of thinking, a way to solve problems creatively and efficiently. And even better, computer science can be used for solving a great variety of problems.
Orange team has prepared a small demo project called Celebrity Lookalike. We found 65 celebrity photos online and loaded them in Orange. Next we cropped photos to faces and turned them black and white, to avoid bias in background and color. Next we inferred embeddings with ImageNet widget and got 2048 features, which are the penultimate result of the ImageNet neural network.
Still, we needed a reference photo to find the celebrity lookalike for. Students could take a selfie and similarly extracted black and white face out of it. Embeddings were computed and sent to Neighbors widget. Neighbors finds n closest neighbors based on the defined distance measure to the provided reference. We decided to output 10 closest neighbors by cosine distance.
Finally, we used Lookalike widget to display the result. Students found it hilarious when curly boys were the Queen of England and girls with glasses Steve Jobs. They were actively trying to discover how the algorithm works by taking photo of a statue, person with or without glasses, with hats on or by making a funny face.
Hopefully this inspires a new generation of students to become scientists, researchers and to actively find solutions to their problems. Coding or not. 🙂
Note: Most widgets we have designed for this projects (like Face Detector, Webcam Capture, and Lookalike) are available in Orange3-Prototypes and are not actively maintained. They can, however, be used for personal projects and sheer fun. Orange does not own the copyright of the images.
Recently Orange and one of its inventors, Blaž Zupan, have been recognized as one of the top 100 changemakers in the region. A 2016 New Europe 100 is an annual list of innovators and entrepreneurs in Central and Eastern Europe highlighting novel approaches to pressing problems.
Orange has been recognized for making data more approachable, which has been our goal from the get-go. The tool is continually being developed with the end user in mind – someone who wants to analyze his/her data quickly, visually, interactively, and efficiently. We’re always thinking hard how to expose valuable information in the data, how to improve the user experience, which defaults are the most appropriate for the method, and, finally, how to intuitively teach people about data mining.
This nomination is a great validation of our efforts and it only makes us work harder. Because every research should be fruitful and fun!
The meeting was organised by Statistical Office of Slovenia and by Eurostat, a Statistical Office of the European Union, and was a primary gathering of representatives from national statistical institutes joined within European Statistical System. The meeting discussed possibilities that big data offers to modern statistics and the role it could play in statistical offices around the world. Say, can one use twitter data to measure costumer satisfaction? Or predict employment rates? Or use traffic information to predict GDP?
During the meeting, Philippe Nieuwbourg from Canada pointed out that the stack of tools for big data analysis, and actually the tool stack for data science, are rather big and are growing larger each day. There is no way that data owners can master data bases, warehouses, Python, R, web development stacks, and similar. Are we alienating the owners and users from their own sources of information?
Of course not. We were invited to the workshop to show that there are data science tools that can actually connect users and data, and empower the users to explore the data in the ways they have never dreamed before. We claimed that these tools should
spark the intuition,
offer powerful and interactive visualizations,
and offer flexibility in design of analysis workflows, say, through visual programming.
We claimed that with such tools, it takes only a few days to train users to master basic and intermediate concepts of data science. And we claimed that this could be done without diving into complex mathematics and statistics.
Part of our presentation was a demo in Orange that showed few tricks we use in such training. The presentation included:
a case study of interactive data exploration by building and visualizing classification tree and forests, and mapping parts of the model to the projection in a scatter plot,
a demo how fun it is to draw a data set and then use it to teach about clustering,
a presentation how trained deep model can be used to explore and cluster images.
The Eurostat meeting was very interesting and packed with new ideas. Our thanks to Boro Nikić for inviting us, and thanks to attendees of our session for the many questions and requests we have received during presentation and after the meeting.