In one of the previous blog posts we mentioned that installing the optional dependency psycopg2 allows Orange to connect to PostgreSQL databases and work directly on the data stored there.
It is also possible to transfer a whole table to the client machine, keep it in the local memory, and continue working with it as with any other Orange data set loaded from a file. But the true power of this feature lies in the ability of Orange to leave the bulk of the data on the server, delegate some of the computations to the database, and transfer only the needed results. This helps especially when the connection is too slow to transfer all the data and when the data is too big to fit in the memory of the local machine, since SQL databases are much better equipped to work with large quantities of data residing on the disk.

If you want to test this feature it is now even easier to do so! A third party distribution called 2UDA provides a single installer for all major OS platforms that combines Orange and a PostgreSQL 9.5 server along with LibreOffice (optional) and installs all the needed dependencies. The database even comes with some sample data sets that can be used to start testing and using Orange out of the box. 2UDA is also a great way to get the very latest version of PostgreSQL, which is important for Orange as it relies heavily on its new TABLESAMPLE clause. It enables time-based sampling of tables, which is used in Orange to get approximate results quickly and allow responsive and interactive work with big data.

We hope this will help us reach an even wider audience and introduce Orange to a whole new group of people managing and storing their data in SQL databases. We believe that having lots of data is a great starting point, but the benefits truly kick in with the ability to easily extract useful information from it.


Working with SQL data in Orange 3

Orange 3 is slowly, but steadily, gaining support for working with data stored in a SQL database. The main focus is to allow huge data sets that do not fit into RAM to be analyzed and visualized efficiently. Many widgets already recognize the type of input data and perform the necessary computations intelligently. This means that data is not downloaded from the database and analyzed locally, but is retained on the remote server, with the computation tasks translated into SQL queries and offloaded to the database engine. This approach takes advantage of the state-of-the-art optimizations relational databases have for working with data that does not fit into working memory, as well as minimizes the transfer of required information to the client.

We demonstrate how to explore and visualize data stored in a SQL table on a remote server in the following short video. It shows how to connect to the server and load the data with the SqlTable widget, manipulate the data (Select Columns, Select Rows), obtain the summary statistics (Box plot, Distributions), and visualize the data (Heat map, Mosaic Display).



The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no 318633