Image Analytics Workshop at AIUCD 2018

This week, Primož and I flew to the south of Italy to hold a workshop on Image Analytics through Data Mining at AIUCD 2018 conference. The workshop was intended to familiarize digital humanities researchers with options that visual programming environments offer for image analysis.

In about 5 hours we discussed image embedding, clustering, finding closest neighbors and classification of images. While it is often a challenge to explain complex concepts in such a short time, it is much easier when working with Orange.

Related: Image Analytics: Clustering

One of the workflows we learned at the workshop was the one for finding the most similar image in a set of images. This is better explained with an example.

We had 15 paintings from different authors. Two of them were painted by Claude Monet, a famous French impressionist painter. Our task was, given a reference image of Monet, to find his other painting in a collection.

A collection of images. It includes two Monet paintings.

First, we loaded our data set with Import Images. Then we sent our images to Image Embedding. We selected Painters embedder since it was specifically trained to recognize authors of paintings.

We used Painters embedder here.

Once we have described our paintings with vectors (embeddings), we can compare them by similarity. To find the second Monet in a data set, we will have to compute the similarity of paintings and find the one most similar one to our reference painting.

Related: Video on image clustering

Let us connect Image Embedding to Neighbors from Prototypes add-on. Neighbors widget is specifically intended to find a number of closest neighbors given a reference data point.

We will need to adjust the widget a bit. First, we will need cosine distance, since we will be comparing images by the content, not the magnitude of features. Next, we will tick off Exclude reference, in order to receive the reference image on the output. We do this just for visualization purposes. Finally, we set the number of neighbors to 2. Again, this is just for a nicer visualization, since we know there are only two Monet’s paintings in the data set.

Neighbors was set to provide a nice visualization. Hence we ticked off Exclude references and set Neighbors to 2.

Then we need to give Neighbors a reference image, for which we want to retrieve the neighbors. We do this by adding Data Table to Image Embedding, selecting one of Monet’s paintings in the spreadsheet and then connecting the Data Table to Neighbors. The widget will automatically consider the second input as a reference.

Monet.jpg is our reference painting. We select it in Data Table.

Now, all we need to do is to visualize the output. Connect Image Viewer to Neighbors and open it.

Voila! The widget has indeed found the second Monet’s painting. So useful when you have thousands of images in your archive!

  • Wolfram Rinke

    I am missing classical test-statistics to analyze regressions and also a widget the generates descriptive statistics mean, std.dev, median, min, max which propagates the results to a table widget for export or further use.

    • Ajda Pretnar

      Most of this is available in Box Plot. As for regression, I’m not sure what task you are looking at, but there are some options here as well.

      • Ajda Pretnar

        Oh I see, you wish to export this in a Data Table… What would be the use case here?

        • Wolfram Rinke

          I dont need to export that necessarily but to post the results like this in a text-formatted widget would be nice at least to add it to the report.

          The result from the previous is like following:

          Running script:
          OLS Regression Results
          ==============================================================================
          Dep. Variable: v_16 R-squared: 0.258
          Model: OLS Adj. R-squared: 0.202
          Method: Least Squares F-statistic: 4.612
          Date: Wed, 29 Aug 2018 Prob (F-statistic): 1.54e-07
          Time: 16:11:56 Log-Likelihood: -231.11
          No. Observations: 215 AIC: 494.2
          Df Residuals: 199 BIC: 548.1
          Df Model: 15
          Covariance Type: nonrobust
          ==============================================================================
          coef std err t P>|t| [0.025 0.975]
          ——————————————————————————
          const 1.5796 0.577 2.737 0.007 0.442 2.717
          v_15 0.3304 0.065 5.100 0.000 0.203 0.458
          v_17 0.2880 0.108 2.671 0.008 0.075 0.501
          v_19 -0.4014 0.130 -3.096 0.002 -0.657 -0.146
          v_25 0.1916 0.103 1.855 0.065 -0.012 0.395
          v_1 0.0274 0.060 0.454 0.650 -0.092 0.146
          v_14 -0.0410 0.045 -0.911 0.363 -0.130 0.048
          v_18 0.0541 0.107 0.506 0.614 -0.157 0.265
          v_20 -0.0151 0.054 -0.278 0.781 -0.122 0.092
          v_21 0.0661 0.056 1.189 0.236 -0.044 0.176
          v_22 0.0490 0.113 0.434 0.665 -0.173 0.271
          v_30 -0.0551 0.062 -0.884 0.378 -0.178 0.068
          v_24 0.0320 0.046 0.691 0.491 -0.059 0.123
          v_26 -0.0363 0.074 -0.489 0.625 -0.183 0.110
          v_27 0.0586 0.094 0.623 0.534 -0.127 0.244
          v_28 0.0160 0.032 0.505 0.614 -0.046 0.078
          ==============================================================================
          Omnibus: 0.815 Durbin-Watson: 2.005
          Prob(Omnibus): 0.665 Jarque-Bera (JB): 0.537
          Skew: -0.093 Prob(JB): 0.764
          Kurtosis: 3.160 Cond. No. 108.
          ==============================================================================

          Warnings:
          [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

          • Wolfram Rinke

            So the data table contains more information about the OLS regression than just the coefficients.

      • Wolfram Rinke

        Sorry for the late response, I just joined DISQUS . 😉

        I use a Python widget at the moment to simply do this in a more informative way, but it would be nice to have the regression widget or another extended that way.

        Here is the sample python code that I use right now:

        import Orange
        from Orange.data import Table, Domain
        from Orange.data import ContinuousVariable, DiscreteVariable, StringVariable
        import numpy as np
        import pandas as pd
        import statsmodels.api as sm
        import statsmodels.formula.api as smf
        import os

        os.chdir(‘/mydir/’)
        attributes = in_data.domain.attributes
        lattr = len(attributes)
        nam_attributes = [attributes[i].name for i in range(lattr)]
        nam_target = [in_data.domain.class_var.name]

        filename = “linearregression_report.txt”

        val = None
        try:
        if (os.path.exists(filename)):

        fil = open(filename, “a”)

        else:

        fil = open(filename, “w”)

        except IOError as err:

        print(err)

        if (fil != None):

        fil.close()

        fil = None

        fil.write(“n”)

        y = np.array(in_data[:,lattr])

        y = y[:,0]

        yf = pd.DataFrame(y, columns=nam_target)

        x = np.array(in_data[:,0:lattr])

        df = pd.DataFrame(x, columns=nam_attributes)

        X = sm.add_constant(df, prepend=True)

        results = smf.OLS(yf,X).fit()

        print(results.summary())

        fil.write(str(results.summary()))

        fil.write(“n”)

        if (fil != None):

        fil.close()

  • Nikos Mylonopoulos

    Hi folks,

    Cool demo but I can’t replicate it. I double-checked everything. All I am getting is the reference painting.

    Any hints?
    Cheers,
    Nikos

    • Ajda Pretnar

      Did you set the Neighbours widget as shown? Perhaps you can append the workflow with channel names so I can see what might be the case.

  • Hussein Mosslam

    Hi,I’m Hussein from Egypt
    how I learn orange?
    I’m working as IT.
    I want to use orange in my work.
    thank you