How do you explain text mining in 3 hours? Is it even possible? Can someone be ready to build predictive models and perform clustering in a single afternoon?
It seems so, especially when Orange is involved.
Yesterday, on August 7, we held a 3-hour workshop on text mining and text analysis for a large crowd of esteemed researchers at Digital Humanities 2017 in Montreal, Canada. Surely, after 3 hours everyone was exhausted, both the audience and the lecturers. But at the same time, everyone was also excited. The audience about the possibilities Orange offers for their future projects and the lecturers about the fantastic participants who even during the workshop were already experimenting with their own data.
The biggest challenge was presenting the inner workings of algorithms to a predominantly non-computer science crowd. Luckily, we had Tree Viewer and Nomogram to help us explain Classification Tree and Logistic Regression! Everything is much easier with vizualizations.
At the end, we were experimenting with explorative data analysis, where we had Hierarchical Clustering, Corpus Viewer, Image Viewer and Geo Map opened at the same time. This is how a researcher can interactively explore the dendrogram, read the documents from selected clusters, observe the corresponding images and locate them on a map.
The workshop was a nice kick-off to an exciting week full of interesting lectures and presentations at Digital Humanities 2017 conference. So much to learn and see!
Being a political scientist, I did not even hear about data mining before I’ve joined Biolab. And naturally, as with all good things, data mining started to grow on me. Give me some data, connect a bunch of widgets and see the magic happen!
But hold on! There are still many social scientists out there who haven’t yet heard about the wonderful world of data mining, text mining and machine learning. So I’ve made it my mission to spread the word. And that was the spirit that led me back to my former university – School of Political Sciences, University of Bologna.
University of Bologna is the oldest university in the world and has one of the best departments for political sciences in Europe. I held a lecture Digital Research – Data Mining for Political Scientists for MIREES students, who are specializing in research and studies in Central and Eastern Europe.
The main goal of the lecture was to lay out the possibilities that contemporary technology offers to researchers and to showcase a few simple text mining tasks in Orange. We analysed Trump’s and Clinton’s Twitter timeline and discovered that their tweets are highly distinct from one another and that you can easily find significant words they’re using in their tweets. Moreover, we’ve discovered that Trump is much better at social media than Clinton, creating highly likable and shareable content and inventing his own hashtags. Could that be a tell-tale sign of his recent victory?
Perhaps. Our future, data-mining savvy political scientists will decide. Below, you can see some examples of the workflows presented at the workshop.