# Dimensionality Reduction by Manifold Learning

The new Orange release (v. 3.3.9) welcomed a few wonderful additions to its widget family, including Manifold Learning widget. The widget reduces the dimensionality of the high-dimensional data and is thus wonderful in combination with visualization widgets.

Manifold Learning widget offers five embedding techniques based on scikit-learn library: t-SNE, MDS, Isomap, Locally Linear Embedding and Spectral Embedding. They each handle the mapping differently and also have a specific set of parameters.

For example, a popular t-SNE requires only a metric (e.g. cosine distance). In the demonstration of this widget, we output 2 components, since they are the easiest to visualize and make sense of.

First, let’s load the data and open it in Scatter Plot. Not a very informative visualization, right? The dots from an unrecognizable square in 2D.

Let’s use embeddings to make things a bit more informative. This is how the data looks like with a t-SNE embedding. The data is starting to have a shape and the data points colored according to regression class reveal a beautiful gradient.

Ok, how about MDS? This is beyond our expectations!

There’s a plethora of options with embeddings. You can play around with ImageNet embeddings and plot them in 2D or use any of your own high-dimensional data and discover interesting visualizations! Although t-SNE is nowadays probably the most popular dimensionality reduction technique used in combination with scatterplot visualization, do not underestimate the value of other manifold learning techniques. For one, we often find that MDS works fine as well.

Go, experiment!

• Curtis Pickering

For the t-SNE method, what perplexity is being used and is there any way to change it?

• Christopher Ross

With Manifold Learning and with localized linear embedding (modified or not), is there a need to process (normalize with mean & SD)? My initial tests seem to indicate there is not a need to do so.

• Ajda Pretnar

It depends. Manifold Learning does not include any normalization on its own, it needs to be provided by the user. As for LLE (or any other method), it really depends on the data. In many cases, normalization is desired to scale all value to a common range. However, if one is dealing with gene expression data, you probably specifically don’t wish to normalize the data.

• Christopher Ross

My data is actually using downhole well logs (acoustic, electrical, and nuclear depth series) and I am trying to classify rock types with this data. With such data, my PCA-Kmeans results typically appear better using a Z-Score (SD/Mean) normalization. From your comments it seems that I might need to normalize my data prior data manifold learning, just as I do with PCA. Thanx