This week, Primož and I flew to the south of Italy to hold a workshop on Image Analytics through Data Mining at AIUCD 2018 conference. The workshop was intended to familiarize digital humanities researchers with options that visual programming environments offer for image analysis.

In about 5 hours we discussed image embedding, clustering, finding closest neighbors and classification of images. While it is often a challenge to explain complex concepts in such a short time, it is much easier when working with Orange.

**Related:** Image Analytics: Clustering

One of the workflows we learned at the workshop was the one for finding the most similar image in a set of images. This is better explained with an example.

We had 15 paintings from different authors. Two of them were painted by Claude Monet, a famous French impressionist painter. Our task was, given a reference image of Monet, to find his other painting in a collection.

First, we loaded our data set with Import Images. Then we sent our images to Image Embedding. We selected Painters embedder since it was specifically trained to recognize authors of paintings.

Once we have described our paintings with vectors (embeddings), we can compare them by similarity. To find the second Monet in a data set, we will have to compute the similarity of paintings and find the one most similar one to our reference painting.

**Related:** Video on image clustering

Let us connect Image Embedding to Neighbors from Prototypes add-on. Neighbors widget is specifically intended to find a number of closest neighbors given a reference data point.

We will need to adjust the widget a bit. First, we will need cosine distance, since we will be comparing images by the content, not the magnitude of features. Next, we will tick off *Exclude reference*, in order to receive the reference image on the output. We do this just for visualization purposes. Finally, we set the number of neighbors to 2. Again, this is just for a nicer visualization, since we know there are only two Monet’s paintings in the data set.

Then we need to give Neighbors a reference image, for which we want to retrieve the neighbors. We do this by adding Data Table to Image Embedding, selecting one of Monet’s paintings in the spreadsheet and then connecting the Data Table to Neighbors. The widget will automatically consider the second input as a reference.

Now, all we need to do is to visualize the output. Connect Image Viewer to Neighbors and open it.

Voila! The widget has indeed found the second Monet’s painting. So useful when you have thousands of images in your archive!

I am missing classical test-statistics to analyze regressions and also a widget the generates descriptive statistics mean, std.dev, median, min, max which propagates the results to a table widget for export or further use.

Most of this is available in Box Plot. As for regression, I’m not sure what task you are looking at, but there are some options here as well.

Oh I see, you wish to export this in a Data Table… What would be the use case here?

I dont need to export that necessarily but to post the results like this in a text-formatted widget would be nice at least to add it to the report.

The result from the previous is like following:

Running script:

OLS Regression Results

==============================================================================

Dep. Variable: v_16 R-squared: 0.258

Model: OLS Adj. R-squared: 0.202

Method: Least Squares F-statistic: 4.612

Date: Wed, 29 Aug 2018 Prob (F-statistic): 1.54e-07

Time: 16:11:56 Log-Likelihood: -231.11

No. Observations: 215 AIC: 494.2

Df Residuals: 199 BIC: 548.1

Df Model: 15

Covariance Type: nonrobust

==============================================================================

coef std err t P>|t| [0.025 0.975]

——————————————————————————

const 1.5796 0.577 2.737 0.007 0.442 2.717

v_15 0.3304 0.065 5.100 0.000 0.203 0.458

v_17 0.2880 0.108 2.671 0.008 0.075 0.501

v_19 -0.4014 0.130 -3.096 0.002 -0.657 -0.146

v_25 0.1916 0.103 1.855 0.065 -0.012 0.395

v_1 0.0274 0.060 0.454 0.650 -0.092 0.146

v_14 -0.0410 0.045 -0.911 0.363 -0.130 0.048

v_18 0.0541 0.107 0.506 0.614 -0.157 0.265

v_20 -0.0151 0.054 -0.278 0.781 -0.122 0.092

v_21 0.0661 0.056 1.189 0.236 -0.044 0.176

v_22 0.0490 0.113 0.434 0.665 -0.173 0.271

v_30 -0.0551 0.062 -0.884 0.378 -0.178 0.068

v_24 0.0320 0.046 0.691 0.491 -0.059 0.123

v_26 -0.0363 0.074 -0.489 0.625 -0.183 0.110

v_27 0.0586 0.094 0.623 0.534 -0.127 0.244

v_28 0.0160 0.032 0.505 0.614 -0.046 0.078

==============================================================================

Omnibus: 0.815 Durbin-Watson: 2.005

Prob(Omnibus): 0.665 Jarque-Bera (JB): 0.537

Skew: -0.093 Prob(JB): 0.764

Kurtosis: 3.160 Cond. No. 108.

==============================================================================

Warnings:

[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

So the data table contains more information about the OLS regression than just the coefficients.

Sorry for the late response, I just joined DISQUS . 😉

I use a Python widget at the moment to simply do this in a more informative way, but it would be nice to have the regression widget or another extended that way.

Here is the sample python code that I use right now:

import Orange

from Orange.data import Table, Domain

from Orange.data import ContinuousVariable, DiscreteVariable, StringVariable

import numpy as np

import pandas as pd

import statsmodels.api as sm

import statsmodels.formula.api as smf

import os

os.chdir(‘/mydir/’)

attributes = in_data.domain.attributes

lattr = len(attributes)

nam_attributes = [attributes[i].name for i in range(lattr)]

nam_target = [in_data.domain.class_var.name]

filename = “linearregression_report.txt”

val = None

try:

if (os.path.exists(filename)):

fil = open(filename, “a”)

else:

fil = open(filename, “w”)

except IOError as err:

print(err)

if (fil != None):

fil.close()

fil = None

fil.write(“n”)

y = np.array(in_data[:,lattr])

y = y[:,0]

yf = pd.DataFrame(y, columns=nam_target)

x = np.array(in_data[:,0:lattr])

df = pd.DataFrame(x, columns=nam_attributes)

X = sm.add_constant(df, prepend=True)

results = smf.OLS(yf,X).fit()

print(results.summary())

fil.write(str(results.summary()))

fil.write(“n”)

if (fil != None):

fil.close()

Hi folks,

Cool demo but I can’t replicate it. I double-checked everything. All I am getting is the reference painting.

Any hints?

Cheers,

Nikos

Did you set the Neighbours widget as shown? Perhaps you can append the workflow with channel names so I can see what might be the case.

Hi,I’m Hussein from Egypt

how I learn orange?

I’m working as IT.

I want to use orange in my work.

thank you