Data Mining for Political Scientists

Being a political scientist, I did not even hear about data mining before I’ve joined Biolab. And naturally, as with all good things, data mining started to grow on me. Give me some data, connect a bunch of widgets and see the magic happen!

But hold on! There are still many social scientists out there who haven’t yet heard about the wonderful world of data mining, text mining and machine learning. So I’ve made it my mission to spread the word. And that was the spirit that led me back to my former university – School of Political Sciences, University of Bologna.

University of Bologna is the oldest university in the world and has one of the best departments for political sciences in Europe. I held a lecture Digital Research – Data Mining for Political Scientists for MIREES students, who are specializing in research and studies in Central and Eastern Europe.

Lecture at University of Bologna
Lecture at University of Bologna

The main goal of the lecture was to lay out the possibilities that contemporary technology offers to researchers and to showcase a few simple text mining tasks in Orange. We analysed Trump’s and Clinton’s Twitter timeline and discovered that their tweets are highly distinct from one another and that you can easily find significant words they’re using in their tweets. Moreover, we’ve discovered that Trump is much better at social media than Clinton, creating highly likable and shareable content and inventing his own hashtags. Could that be a tell-tale sign of his recent victory?

Perhaps. Our future, data-mining savvy political scientists will decide. Below, you can see some examples of the workflows presented at the workshop.

bologna-workflow1
Author predictions from Tweet content. Logistic Regression reports on 92% classification accuracy and AUC score. Confusion Matrix can output misclassified tweets to Corpus Viewer, where we can inspect these tweets further.

 

bologna-wordcloud
Word Cloud from preprocessed tweets. We removed stopwords and punctuation to find frequencies for meaningful words only.

 

bologna-enrichment
Word Enrichment by Author. First we find Donald’s tweets with Select Rows and then compare them to the entire corpus in Word Enrichment. The widget outputs a ranked list of significant words for the provided subset. We do the same for Hillary’s tweets.

 

bologna-topicmodelling
Finding potential topics with LDA.

 

bologna-emotions
Finally, we offered a sneak peek of our recent Tweet Profiler widget. Tweet Profiler is intended for sentiment analysis of tweets and can output classes. probabilities and embeddings. The widget is not yet officially available, but will be included in the upcoming release.
  • Riktesh Srivastava

    Great Article, Thanks for posting!
    How you differentiate between Donald’s and Hillary’s Tweets through same Bag of words? Please guide

  • Irfan Wahyudin

    Helo, It is great to have sentiment analysis add-on in Orange. I want to ask, is it already support foreign language other than english?

    • Ajda Pretnar

      Hello Irfan, are you referring to sentiment analysis for foreign languages? Tweet profiler unfortunately works only on English, since it’s a model we’ve trained ourselves. I don’t think we’ll be making a separate Tweet profiler for other languages, but we are planning to introduce a broader language support for general sentiment analysis.

      • Irfan Wahyudin

        Yes i referred to sentiment analysis for foreign languages. I think it would be great if there is an opportunity for many contributors around the world to contribute the sentiment dictionary. I’m a lecturer in Indonesia, hope someday I have an opportunity to become a contributor for Bahasa Indonesia. 🙂

        Btw, does the Tweet profiler use any kind of lexicon e.g Sentiwordnet? if so, does the algorithm doing POS Tag for each word before determining the sentiment?

  • jmmnn

    Hello Ajda, thanks for your this post. Would it be possible for you to share your .ows files so it’s easy to try out your examples? My name is Jorge Martinez Navarrete and i work for the United Nations.

    • Ajda Pretnar

      Dear Jorge, a repository with Orange workflows is in the making. If you need them immediately, please write to info@biolab.si. However, I’d suggest you to try and construct the workflows yourself. This is a great way to learn and then be able to do an in-depth analysis on your own data. 🙂

  • Guest

    Well I don’t find the Twitter widget in my Orange (ver 3.3.9, MacOs). Can you help me? Thx!

    • Guest

      Oh, silly me. I found it now. Thanks anyway 🙂

      • Guest

        Ok, another question…How to get Twitter API key? 🙂