dataloading, performance

Data loading speedups

MARKO

Mar 28, 2011

Orange has been loading data faster since the end of February, especially if there are many attributes in the file.

Quick comparisons between the old new versions, measured on my computer:

  • adult.tab (32561 examples, 15 attributes): old version = 1.41s, new version = 0.86s.
  • DLBCL.tab (77 examples, 7071 attributes): old version = 2.72s, new version = 0.93s.
  • GDS1962.tab (104 examples, 31837 attributes): old version = 33.5s, new version = 6.6s.

The speedups were obtained with:

  • reuse of a buffer for parsing,
  • skipping type detection for attributes with known types, and
  • by keeping attributes in a different data structure internally.

This site uses cookies to improve your experience.