Visualizing relations between data instances can tell us a lot about our data. Let’s see how this works in Orange. We have a data set on machine learning and data mining conferences and journals, with the number of shared authors for each publication venue reported. We can estimate similarity between two conferences using the author profile of a conference: two conference would be similar if they attract the same authors. The data set is already 9 years old, but obviously, it’s about the principle. 🙂 We’ve got two data files: one is a distance file with distance scores already calculated by Jaccard index and the other is a standard conferences.tab file.
We load .tab file with the File widget (data set already comes with Orange) and .dst file with the Distance File widget (select ‘Browse documentation data sets’ and choose conferences.dst).
Now we would like to create a graph from the distance file. Connect Distance File to Network from Distances. In the widget, we’ve selected a high distance threshold, because we would like to get more connections between nodes. We’ve also checked ‘Include also closest neighbors’ to see each node connected with at least one other node.
We can visualize our graph in Network Explorer. What we get is a quite uninformative network of conferences with labelled nodes. Now for the fun part. Connect the File widget with Network Explorer and set the link type to ‘Node Data’. This will match the two domains and display additional labelling options in Network Explorer.
We’ve colored the nodes by type and set the size of the nodes to the number of authors per conference/paper. Finally, we’ve set the node label to ‘name’. Seems like International Conference on AI and Law and AI and Law journal are connected through the number of shared authors. Same goes for AI in Medicine in Europe conference and AI and Medicine journal. Connections indeed make sense.
There are many other things you can do with the Networks add-on in Orange. You can color nodes by predictions, highlight misclassifications or output only nodes with certain network parameters. But for today, let this be it.