Text Workshops in Ljubljana

In the past month, we had two workshops that focused on text mining. The first one, Faksi v praksi, was organized by the University of Ljubljana Career Centers, where high school students learned about what we do at the Faculty of Computer and Information Science. We taught them what text mining is and how to group a collection of documents in Orange. The second one took on a more serious note, as the public sector employees joined us for the third set of workshops from the Ministry of Public Affairs. This time, we did not only cluster documents, but also built predictive models, explored predictions in nomogram, plotted documents on a map and discovered how to find the emotion in a tweet.

These workshops gave us a lot of incentive to improve the Text add-on. We really wanted to support more languages and add extra functionalities to widgets. In the upcoming week, we will release the 0.5.0 version, which introduces support for Slovenian in Sentiment Analysis widget, adds concordance output option to Concordances and, most importantly, implements UDPipe lemmatization, which means Orange will now support about 50 languages! Well, at least for normalization. 😇

Today, we will briefly introduce sentiment analysis for Slovenian. We have added the KKS 1.001 opinion corpus of Slovene web commentaries, which is a part of the CLARIN infrastructure. You can access it in the Corpus widget. Go to Browse documentation corpora and look for slo-opinion-corpus.tab. Let’s have a quick view in a Corpus Viewer.

The data comes from comment sections of Slovenian online media and contains a fairly expressive language. Let us observe, whether a post is negative or positive. We will use Sentiment Analysis widget and select the Liu Hu method for Slovenian. This is a dictionary based method, where the algorithm sums the positive words and subtracts the sum of negative words. This gives a final score of the post.

We will have to adjust the attributes for a nicer view in a Select Columns widget. Remove all attributes other than sentiment.

Finally, we can observe the results in a Heat Map. The blue lines are the negative posts, while the yellow ones are positive. Let us select the most positive tweets and see, what they are about.

Looks like Slovenians are happy, when petrol gets cheaper and sports(wo)men are winning. We can relate.

Of course, there are some drawbacks of lexicon-based methods. Namely, they don’t work well with phrases, they often don’t consider modern language (see ‘Jupiiiiiii’ or ‘Hooooooraaaaay!’, where the more the letters, the more expressive the word is) and they fail with sarcasm. Nevertheless, even such crude methods give us a nice glimpse into the corpus and enable us to extract interesting documents.

Stay tuned for the information on the release date and the upcoming post on UDPipe infrastructure!

  • Mark Butler

    Hi, the latest Text add-on (ver 0.5.2) for a Mac now installs perfectly, thank you very much indeed for all the hard work in fixing it and providing the opportunity for me to teach others to use it.
    regards Mark

    • Ajda Pretnar

      That’s great news! It was a pain making everything work, but I’m glad things are fixed now. Thank you for notifying us!

  • Mark Butler

    I updated to 3.16, reinstalled my add-ons from the Option tab, all work great except the new Text add-on (this used to work in ver 3.15). Here is the error ‘Command failed: python python -m pip install Orange3-Text exited with non zero status’. I have updated my pip too but it still doesn’t work. Will this be fixed soon? I teach students and use Orange a lot, it is an excellent resource! Any work around?
    M. Butler

    • Ajda Pretnar

      Hi Mark! Yes, we have just fixed it (hopefully) and it should be available tomorrow. Sorry for the inconvenience. The workaround is to download https://files.pythonhosted.org/packages/d9/39/9ee1bbd3073efb9ff2eb16cb3ecc970a5e750f313b7641f59450a4176eff/Orange3-Text-0.4.0.tar.gz, drag and drop it into the add-on dialogue and run the installation.

      • Mark Butler

        Hi, thank you so much for the reply. I downloaded the doc successfully but (unless I misunderstood you) was unable to drag and drop it into the add-on dialogue box from the drop down menu on the canvas. I’ll try tomorrow (just noticed an update on the Text-add 0.5.01 this also fails, will try again tomorrow. Thank you for the rapid communication, much appreciated.)