Orange already supports multi-target classification, but the current implementation of clustering trees is written in Python. One of the five projects Orange has chosen at this year’s Google Summer of Code is the implementation of clustering trees in C. The goal of my project is to speed up the building time of clustering trees and lower their spatial complexity, especially when used in random forests. Implementation will be based on Orange’s SimpleTreeLearner and will be integrated with Orange 3.0.
Once the clustering trees are implemented and integrated, documentation and unit tests will be written. Additionally I intend to make an experimental study that will compare the effectiveness of clustering trees with established multi-target classifiers (like PLS and chain classifiers) on benchmark data-sets. I will also work on some additional tasks related to multi-target classification that I had not included in my original proposal but Orange’s team thinks would be useful to include. Among these is a chain classifier framework that Orange is currently missing.
If any reader is interested in learning more about clustering trees or chain classifiers these articles should cover the basics:
I am a third year undergraduate student at the Faculty of Computer and Information Science in Ljubljana and my project will be mentored by prof. dr. Blaž Zupan. I thank him and the rest of the Orange team for advice and support.
Lead by Jure Žbontar, the team from University of Ljubljana wins over 126 other entrants in an international competition in predictive data analytics.
Jure’s team consisted of several Orange developers and computer science students: Miha Zidar, Blaž Zupan, Gregor Majcen, Marinka Žitnik in Matic Potočnik. To win, the team had to predict topics for 10.000 MedLine documents that were represented with over 25.000 algorithmically derived numerical features. Given was training set of another 10.000 documents in the same representation but each labeled with a set of topics. From the training set the task was to develop a model to predict labels for documents in the test set. A particular challenge was guessing the right number of topics to be associated with the documents, as these, at least in the training set, varied from one to a dozen.
JRS 2012 is just one in a series of competitions recently organized on servers such as TunedIT and Kaggle. The price for winning was $1000 and a trip to Joint Rough Set Symposium in Chengdu, China, to present a winning strategy and developed data mining techiques.
This year five students have been accepted to participate in Google Summer of Code and contribute to Orange in their summer time. Congratulations!
- Amela – Widgets for statistics
- Andrej T. – Computer vision add-on for Orange
- CoderWilliam – A Fully-Featured Neural Network Library Implementation Based On the Flood Library with Extension for Deep Learning
- Makarov Dmitry – Text mining add-on for Orange
- Miran Levar – Multi-Target Learning for Orange
Overall, 1,212 students have been accepted this year to various open source organizations from all around the world.
Orange GUI is being redesigned. Expect a welcome screen with selection of preloaded widget schemes, simpler access to computational components, and integration with intelligent interface (widget suggestions). For the project we have engaged a designer Peter Čuhalev. To give you a taste of what is going on, here are some icons for widget sets that are being redesigned. There are in B/W, the color will be decided on and added in later stages. Below are just the icons – widget symbols with no frames. Current frames are rounded squares, while it looks like the widget frames for the new GUI will be circles. New icons are designed in a vector format.