Celebrity Lookalike or How to Make Students Love Machine Learning

Recently we’ve been participating at Days of Computer Science, organized by the Museum of Post and Telecommunications and the Faculty of Computer and Information Science, University of Ljubljana, Slovenia. The project brought together pupils and students from around the country and hopefully showed them what computer science is mostly about. Most children would think programming is just typing lines of code. But it’s more than that. It’s a way of thinking, a way to solve problems creatively and efficiently. And even better, computer science can be used for solving a great variety of problems.

Related: On teaching data science with Orange

Orange team has prepared a small demo project called Celebrity Lookalike. We found 65 celebrity photos online and loaded them in Orange. Next we cropped photos to faces and turned them black and white, to avoid bias in background and color. Next we inferred embeddings with ImageNet widget and got 2048 features, which are the penultimate result of the ImageNet neural network.

We find faces in photos and turn them to black and white. This eliminates the effect of background and distinct colors for embeddings.
We find faces in photos and turn them to black and white. This eliminates the effect of the background and distinct colors for embeddings.

 

Still, we needed a reference photo to find the celebrity lookalike for. Students could take a selfie and similarly extracted black and white face out of it. Embeddings were computed and sent to Neighbors widget. Neighbors finds n closest neighbors based on the defined distance measure to the provided reference. We decided to output 10 closest neighbors by cosine distance.

workflow111
Celebrity Lookalike workflow. We load photos, find faces and compute embeddings. We do the same for our Webcam Capture. Then we find 10 closest neighbors and observe the results in Lookalike widget.

 

Finally, we used Lookalike widget to display the result. Students found it hilarious when curly boys were the Queen of England and girls with glasses Steve Jobs. They were actively trying to discover how the algorithm works by taking photo of a statue, person with or without glasses, with hats on or by making a funny face.

lookalike6181683

Hopefully this inspires a new generation of students to become scientists, researchers and to actively find solutions to their problems. Coding or not. 🙂

dsc_4982

Note: Most widgets we have designed for this projects (like Face Detector, Webcam Capture, and Lookalike) are available in Orange3-Prototypes and are not actively maintained. They can, however, be used for personal projects and sheer fun. Orange does not own the copyright of the images.

Intro to Data Mining for Life Scientists

RNA Club Munich has organized Molecular Life of Stem Cells Conference in Ljubljana this past Thursday, Friday and Saturday. They asked us to organize a four-hour workshop on data mining. And here we were: four of us, Ajda, Anze, Marko and myself (Blaz) run a workshop for 25 students with molecular biology and biochemistry background.

img_20160929_133840

We have covered some basic data visualization, modeling (classification) and model scoring, hierarchical clustering and data projection, and finished with a touch of deep-learning by diving into image analysis by deep learning-based embedding.

Related: Data Mining Course at Baylor College of Medicine in Houston

It’s not easy to pack so many new things on data analytics within four hours, but working with Orange helps. This was a hands-on workshop. Students brought their own laptops with Orange and several of its add-ons for bioinformatics and image analytics. We also showed how to prepare one’s own data using Google Forms and designed a questionary, augment it in a class, run it with students and then analyze the questionary with Orange.

pano_20160929_113352

img_0355

img_0353

The hard part of any short course that includes machine learning is how to explain overfitting. The concept is not trivial for data science newcomers, but it is so important it simply cannot be left out. Luckily, Orange has some cool widgets to help us understanding the overfitting. Below is a workflow we have used. We read some data (this time it was a yeast gene expression data set called brown-selected that comes with Orange), “destroyed the data” by randomly permuting the column with class values, trained a classification tree, and observed near perfect results when the model was checked on the training data.

yeast-overfitting-distributions

Sure this works, you are probably saying. The models should have been scored on a separate test set! Exactly, and this is what we have done next with Data Sampler, which lead us to cross-validation and Test & Score widget.

This was a great and interesting short course and we were happy to contribute to the success of the student-run MLSC-2016 conference.

Visualizing Gradient Descent

This is a guest blog from the Google Summer of Code project.

 

Gradient Descent was implemented as a part of my Google Summer of Code project and it is available in the Orange3-Educational add-on. It simulates gradient descent for either Logistic or Linear regression, depending on the type of the input data. Gradient descent is iterative approach to optimize model parameters that minimize the cost function. In machine learning, the cost function corresponds to prediction error when the model is used on the training data set.

Gradient Descent widget takes data on input and outputs the model and its coefficients.

gradient-descent-flow

The widget displays the value of the cost function given two parameters of the model. For linear regression, we consider feature from the training set with the parameters being the intercept and the slope. For logistic regression, the widget considers two feature and their associated multiplicative parameters, setting the intercept to zero. Screenshot bellow shows gradient descent on a Iris data set, where we consider petal length and sepal width on the input and predict the probability that iris comes from the family of Iris versicolor.

gradient-descent1-stamped

  1. The type of the model used (either Logistic regression or Linear regression)
  2. Input features (one for X and one for Y axis) and the target class
  3. Learning rate is the step size of the gradient descent
  4. In a single iteration step, stochastic approach considers only a single data instance (instead of entire training set). Convergence in terms of iterations steps is slower, and we can instruct the widget to display the progress of optimization only after given number of steps (Step size)
  5. Step through the algorithm (steps can be reverted with step back button)
  6. Run optimization until convergence

 

Following shows gradient descent for linear regression using The Boston Housing Data Set when trying to predict the median value of a house given its age.

gradient-descent-age

On the left we use the regular and on the right the stochastic gradient descent. While the regular descent goes straight to the target, the path of stochastic is not as smooth.

We can use the widget to simulate some dangerous, unwanted behavior of gradient descent. The following screenshots show two extreme cases with too high learning rate where optimization function never converges, and a low learning rate where convergence is painfully slow.

gradient-descent-extrems

The two problems as illustrated above are the reason that many implementations of numerical optimization use adaptive learning rates. We can simulate this in the widget by modifying the learning rate for each step of the optimization.

A visit from the Tilburg University

Biolab is currently hosting two amazing data scientists from the Tilburg University – dr. Marie Nilsen and dr. Eric Postma, who are preparing a 20-lecture MOOC on data science for non-technical audience. A part of the course will use Orange. The majority of their students is coming from humanities, law, economy and behavioral studies, thus we are discussing options and opportunities for adapting Orange for social scientists. Another great thing is that the course is designed for beginner level data miners, showcasing that anybody can mine the data and learn from it. And then consult with statisticians and data mining expert (of course!).

Biolab team with Marie and Eric, who is standing next to Ivan Cankar - the very serious guy in the middle.
Biolab team with Marie and Eric, who is standing next to Ivan Cankar – the very serious guy in the middle.

 

To honor this occasion we invite you to check out the Polynomial regression widget, which is specially intended for educational use. There, you can showcase the problem of overfitting through visualization.

First, we set up a workflow.

blog7

Then we paint, say, at most 10 points into the Paint Data widget. (Why at most ten? You’ll see later.)

blog1

 

Now we open our Polynomial Regression widget, where we play with polynomial degree. Polynomial Degree 1 gives us a line. With coefficient 2 we get a curve that fits only one point. However, with the coefficient 7 we fit all the points with one curve. Yay!

blog2

blog3

blog5

 

But hold on! The curve now becomes very steep. Would the lower end of the curve at about (0.9, -2.2) still be a realistic estimate of our data set? Probably not. Even when we look at the Data Table with coefficient values, they seem to skyrocket.

blog6

 

This is a typical danger of overfitting, which is often hard to explain, but with the help of these three widgets becomes as clear as day!
Now go out and share the knowledge!!!