Data loading speedups

Orange has been loading data faster since the end of February, especially if there are many attributes in the file.

Quick comparisons between the old new versions, measured on my computer:

  • adult.tab (32561 examples, 15 attributes): old version = 1.41s, new version = 0.86s.
  • DLBCL.tab (77 examples, 7071 attributes): old version = 2.72s, new version = 0.93s.
  • GDS1962.tab (104 examples, 31837 attributes): old version = 33.5s, new version = 6.6s.

The speedups were obtained with:

  • reuse of a buffer for parsing,
  • skipping type detection for attributes with known types, and
  • by keeping attributes in a different data structure internally.

Orange has been accepted into GSoC 2011

This year Orange has been accepted into the Google summer of Code program as a mentoring organization. It is one of 175 open-source organizations/projects/groups which will this year mentor students while they will be working on those accepted open source projects.

We have prepared a page on our Trac with more information about the Google Summer of Code program, especially how the interested students should apply with their proposals. There is also a list of of some ideas we are proposing for this year. Check out also official project page for Orange.

Google Summer of Code is a Google-sponsored program where Google stipends students working for a summer job on an open source projects from all around the world. Student is paid $5000 (and a t-shirt!) for approximately two months of work/contribution to the project. More about the program is available on its homepage.