Orange in Space

Did you know that Orange has already been to space? Rosario Brunetto (IAS-Orsay, France) has been working on the analysis of infrared images of asteroid Ryugu as a member of the JAXA Hayabusa2 team. The Hayabusa2 asteroid sample-return mission aims to retrieve data and samples from the near-Earth Ryugu asteroid and analyze its composition. Hayabusa2 arrived at Ryugu on June 27 and while the spacecraft will return to Earth with a sample only in late 2020, the mission already started collecting and sending back the data. And of course, a part of the analysis of Hayabusa’s space data has been done in Orange!

An image of the asteroid Ryugu acquired by the Hayabusa2 (©JAXA).

 

Within the Hayabusa2 project, near-infrared spectral data will be collected in three series: the first part is the macro data from remote sensing measurements that are being collected at different altitudes from the asteroid by the Japanese spectrometer NIRS3 (©JAXA). The second part is surface infrared imaging at the micron scale that will soon be performed (October 2018) by the French MicrOmega instrument on the lander MASCOT (DLR-CNES). The third part are the samples that will be analyzed upon return. Among the techniques that will be used in different laboratories around the world in 2021 to analyze the returned samples are the hyperspectral imaging and micro-tomography with an infrared imaging FPA microscope, that will be performed by the IAS team at SMIS-SOLEIL. This means the data will contain satellite spectral images as well as microscope measurements.

Dr. Brunetto is currently working with the first part of the data, namely the macro hyperspectral images of the asteroid. Several tens of thousands of spectra over 70 spectral channels have already been acquired. The main goal of this initial exploration was to constrain the surface composition.

Once the data was preprocessed and cleaned in Python, separate surface regions were extracted in Orange with k-Means and PCA and plotted with the HyperSpectra widget, which comes as a part of the Spectroscopy package. So why was Orange chosen over other tools? Dr. Brunetto says Orange is an easy and friendly tool for complicated things, such as exploring the compositional diversity of the asteroid at the different scales. There are many clustering techniques he can use in Orange and he likes how he can interactively change the number of clusters and the changes immediately show in the plot. This enables the researchers to determine the level of granularity of the analysis, while they can also immediately inspect how each cluster looks like in a hyperspectra plot.

Moreover, one can quickly test methods and visualize the effects and at the same time have a good overview of the workflow. Workflows can also be reused once the new data comes in or, if the pipeline is standard, used on a completely different data set!

A simple workflow for the analysis of spectral data. 😁 A great thing about Orange is that you can label parts of the workflow and explore a different aspect of the data in each branch!

 

We would of course love to show you the results of the asteroid analysis, but as the project is still ongoing, the data is not yet available to the public. Instead, we asked Zélia Dionnet, dr. Brunetto’s PhD student, to share the results of her work on the organic and mineralogic heterogeneity of the Paris meteorite, which were already published.

She analyzed the composition of the Paris meteorite, which was discovered in 2008 in a statue. The story of how the meteorite was found is quite interesting in itself, but we wanted to know more on how the sample was analyzed in Orange. Dionnet had a slightly larger data set, with 16,000 spectra and 1600 wavenumbers. Just like dr. Brunetto, she used k-Means to discover interesting regions in the sample and Hyperspectra widget to plot the results.

k-Means clusters plotted in the HyperSpectra widget.

 

At the top, you can see a 2D map of the meteorite sample showing the distribution of the clusters that were identified with k-Means. At the bottom, you see cluster averages for the spectra. The green region is the most interesting one and it shows crystalline minerals, which formed billions of years ago as the hydrothermal processes in the asteroid parent body of the meteorite turned amorphous silicates into phyllosilicates. The purple, on the contrary, shows different micro-sized minerals.

This is how to easily identify the compositional structure of samples with just a couple of widgets. Orange seems to love going to space and can’t wait to get its hands dirty with more astro-data!

 

Spectroscopy Workshop at BioSpec and How to Merge Data

Last week Marko and I visited the land of the midnight sun – Norway! We held a two-day workshop on spectroscopy data analysis in Orange at the Norwegian University of Life Sciences. The students from BioSpec lab were yet again incredible and we really dug deep into Orange.

Related: Orange with Spectroscopy Add-on

A class full of dedicated scientists.

 

One thing we did was see how to join data from two different sources. It would often happen that you have measurements in one file and the labels in the other. Or in our case, we wanted to add images to our zoo.tab data. First, find the zoo.tab in the File widget under Browse documentation datasets. Observe the data in the Data Table.

Original zoo data set.

 

This data contains 101 animal described with 16 different features (hair, aquatic, eggs, etc.), a name and a type. Now we will manually create the second table in Excel. The first column will contain the names of the animals as they appear in the original file. The second column will contain links to images of animals. Open your favorite browser and find a couple of images corresponding to selected animals. Then add links to images below the image column. Just like that:

Extra data that we want to add to the original data.

 

Remember, you need a three-row header to define the column that contains images. Under the image column add string in the second and type=image in the third row. This will tell Orange where to look for images. Now, we can check our animals in Image Viewer.

A quick glance at an Image Viewer will tell us whether our images got loaded correctly.

 

Finally, it is time to bring in the images to the existing zoo data set. Connect the original File to Merge Data. Then add the second file with animal images to Merge Data. The default merging method will take the first data input as original data and the second data as extra data. The column to match by is defined in the widget. In our case, it is the name column. This means Orange will look at the first name column and find matching instances in the second name column.

 

A quick look at the merged data shows us an additional image column that we appended to the original file.

Merged data with a new column.

 

This is the final workflow. Merge Data now contains a single data table on the output and you can continue your analysis from there.

Find out more about spectroscopy for Orange on our YouTube channel or contribute to the project on Github.

Orange with Spectroscopy Add-on Workshop

We have just concluded our enhanced Introduction to Data Science workshop, which included several workflows for spectroscopy analysis. Spectroscopy add-on is intended for the analysis of spectral data and it is just as fun as our other add-ons (if not more!).

We will prove it with a simple classification workflow. First, install Spectroscopy add-on from Options – Add-ons menu in Orange. Restart Orange for the add-on to appear. Great, you are ready for some spectral analysis!

Use Datasets widget and load Collagen spectroscopy data. This data contains cells measured with FTIR and annotated with the major chemical compound at the imaged part of a cell. A quick glance in a Data Table will give us an idea how the data looks like. Seems like a very standard spectral data set.

Collagen data set from Datasets widget.

 

Now we want to determine, whether we can classify cells by type based on their spectral profiles. First, connect Datasets to Test & Score. We will use 10-fold cross-validation to score the performance of our model. Next, we will add Logistic Regression to model the data. One final thing. Spectral data often needs some preprocessing. Let us perform a simple preprocessing step by applying Cut (keep) filter and retaining only the wave numbers between 1500 and 1800. When we connect it to Test & Score, we need to keep in mind to connect the Preprocessor output of Preprocess Spectra.

Preprocessor that keeps a part of the spectra cut between 1500 and 1800. No data is shown here, since we are using only the preprocessing procedure as the input for Test & Score.

 

Let us see how well our model performs. Not bad. A 0.99 AUC score. Seems like it is almost perfect. But is it really so?

10-fold cross-validation on spectral data. Our AUC and CA scores are quite impressive.

 

Confusion Matrix gives us a detailed picture. Our model fails almost exclusively on DNA cell type. Interesting.

Confusion Matrix shows DNA is most often misclassified. By selecting the misclassified instances in the matrix, we can inspect why Logistic Regression couldn’t model these spectra

 

We will select the misclassified DNA cells and feed them to Spectra to inspect what went wrong. Instead of coloring by type, we will color by prediction from Logistic Regression. Can you find out why these spectra were classified incorrectly?

Misclassified DNA spectra colored by the prediction made by Logistic Regression.

 

This is one of the simplest examples with spectral data. It is basically the same procedure as with standard data – data is fed as data, learner (LR) as learner and preprocessor as preprocessor directly to Test & Score to avoid overfitting. Play around with Spectroscopy add-on and let us know what you think! 🙂