Aller au contenu

Ressources for the Vision Application

This application is based on iNaturalist, from which we extract subdatasets specifically for this course.

Ressources for Session 2

Dataset

Main features :

  • Seven classes were sampled from the iNaturalist taxonomy.
  • There are 100 samples for each class, split into approx 80 for train and 20 for test
  • Class names can be fetched from the embedding file (see below), and you can get examples of images on the iNaturalist website

Latent Space

As for Lab 1, the images have been put in a latent space using the vision encoder VIT-H/14 from OpenClip, a deep learning model from this paper. We will delve into the details of Deep Learning and feature extraction from course 4.

For now, you can just open the numpy array containing all samples in the latent space from the embeddings-cv-lab2.npz. This file is a dictionary, whose values are indexed as "X_train", "y_train", "X_test", and "y_test".

In that regard, files are to be loaded using the following snipet code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
train_test_dataset = np.load(PATH_TO_EMBEDDINGS_LAB2)

X_train, X_test, y_train, y_test = train_test_dataset['X_train'], train_test_dataset['X_test'], train_test_dataset['y_train'], train_test_dataset['y_test']

# Class names are also included
class_names = train_test_dataset['class_names']
print(f"Class names : {class_names}")

# X_train should have a shape of 
# (560,1024), i.e. (number of samples x embedding dimension)
# y_test should have a shape of (140), i.e. the number of test samples. Each sample's label is indexed via an integer. 
print(f"Shape of X_train: {X_train.shape}"), print(f"Shape of y_test: {y_test.shape}")

Work to do

Compute the classification on this data, using the technique you chosed. Please, refer to Lab Session 2 main page for details.

As an example, here are the results obtained using the K-Nearest Neighbour algorithm with K=10 :

Precision Recall F1-score
Eriogonum 0.70 0.88 0.78
Rubus 0.82 0.78 0.80
Quercus 0.74 1.00 0.85
Ericales 0.80 0.63 0.71
Lamioideae 0.95 0.87 0.91
Ranunculeae 0.62 0.95 0.75
Ranunculaceae 0.67 0.20 0.31

You should be able to replicate these results using the function classification_report from scikit-learn:

1
2
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred,target_names)class_names))

Ressources for Session 1

Dataset

Main features :

  • 200 images
  • On the 200 images, 100 are insects and the 100 are plants.

Visualisation of a few examples

Plants

Plants

Insects

Insects

Latent Space

The 200 images have been put in a latent space using the vision encoder VIT-H/14 from OpenClip, a deep learning model from this paper. We will delve into the details of Deep Learning and feature extraction from course 4.

For now, you can just open the numpy array containing all samples in the latent space from the embeddings-cv-lab1.npz.

Work to do

Compute, visualize and interpret the distance matrix, as explained in Lab Session 1 main page.