Iota2 can also do regression

Iota2 is constantly evolving, as you can check at the gitlab repository. Bugs fix, documentation updates and dependency version upgrade are done regularly. Also, new features are introduced such as the support of Landsat 8 & 9 images, including thermal images, or for what concerns us in this post, the support of regression models.

In machine learning, regression algorithms are supervised methods1 used to estimate a continuous variable from some observations (see more here: https://en.wikipedia.org/wiki/Regression_analysis). In remote sensing, regression is used to recover biophysical or agronomic variables from satellite images. For instance, it is used in SNAP (https://step.esa.int/main/download/snap-download/) to estimate LAI from Sentinel-2 surface reflectance.

At the beginning, iota2 was designed to perform classification (estimation of categorical values) whose framework shares a lot with regression but has also some significant differences. To cite the main ones: the loss function as well as how the data are split differ between classification and regression. Some other differences may exist depending on the algorithm. Fortunately, since the end of 2023, iota2 is also able to perform regression with satellite image time series.

In this post, we are going to illustrate the workflow of the regression builder on a simple case: estimate the red band value of one Sentinel-2 date having observed other Sentinel-2 dates. Full information can be found in the online documentation: https://docs.iota2.net/develop/i2_regression_tutorial.html.

1. Iota2 configuration

1.1. Data set

To illustrate iota2 capability, we set-up the following data set: One year of Sentinel2 data over the tile T31TCJ, starting on the 2018-02-10 until the 2018-12-10, from which we try to infer the red band of 2018-12-17. Yes, it is not a real problem but since Sentinel2 data are free and open-source, we can put online this toy data to let you reproduce the simulation: https://zenodo.org/records/10574545.

The area in pixel size is 909*807, see figure below. We have randomly extracted pixel values from the red band for training and validation and put everything in a shapefile.

emprise.png

Figure 1: Area of interest (background image © OpenStreetMap contributors)

1.2. Configuration file

As usual with iota2, the configuration file contains all the necessary information and it is very similar to what is required for classification (see https://www.cesbio.cnrs.fr/multitemp/iota2-latest-release-deep-learning-at-the-menu for a more detailed discussion on the configuration file). We use the following one

chain:
{
    s2_path: "<<path_dir>>/src/sensor_data"
    output_path: "<<path_dir>>/Iota2Outputs/"
    remove_output_path: True
    list_tile: "T31TCJ"
    ground_truth: "<<path_dir>>/src/vector_data/ref_small.shp"
    data_field: "code"
    spatial_resolution: 10
    proj: "EPSG:2154"
    first_step: "init"
    last_step: "validation"
}

arg_train:
{
    runs: 1
    ratio : 0.75
    sample_selection:
    {
        "sampler": "periodic",
        "strategy": "all",
    }
}
scikit_models_parameters:
{
    model_type:"RandomForestRegressor"
    keyword_arguments:
    {
        n_estimators : 200
        n_jobs : -1
    }
}
python_data_managing:
{
    number_of_chunks: 10
}
builders:
{
    builders_class_name: ["i2_regression"]
}
sentinel_2:
{
    temporal_resolution:10
}

1.3. Results

146,579 pixels were used to train the random forest with 200 trees, and 48,860 pixels were used as test samples to compute the prediction accuracy. Iota2 returns the following accuracy results for the test set (we normalize the data to have reflectance and not digital number):

max error mean absolute error mean squared error median absolute error r2 score
0.200 0.005 6.383e-5 0.003 0.894

Well, the results are good 🙂 Given one year of data, it is possible to infer most of the pixel values 7 days later. Congrats iota2 !

If we look at the map, we can see that most of the errors are made over areas with light clouds. It would have been the case also for areas with rapid changes since we have done nothing particular to deal with changes in the regression set-up. Figures below show the true red band, the estimated one and the prediction error, computed as the normalized absolute error.

true.png

Figure 2: True Sentinel-2 red band.

pred.png

Figure 3: Predicted Sentinel-2 red band.

error.png

Figure 4: Prediction error in percentage; \(\frac{|true-pred|}{true}\).

2. Discussion

Iota2 offers many more possibilties, as for the classification framework: multi-run, data augmentation or spatial stratification for instance. The online documentation (https://docs.iota2.net/develop/i2_regression_tutorial.html) provides all the relevant information: please check it if needed.

In this short post, we have used random forest, but other methods are available, in particular deep learning based methods. For now, it is possible to regress only one parameter at time. A possible extension would be to regress several variables simultaneously.

This new feature has been used to map moving date of permanent grasslands at the national scale for year 2022. This work is in progress, but you can see current results on zenodo (draft map: https://zenodo.org/records/10118125). Iota2 has simplified greatly the production of such large scale map.

3. Acknowledgement

Iota2 development team is composed of Arthur Vincent and Benjamin Tardy, from CS Group. Hugo Trentesaux spent 10 months (October 2021 – July 2022) in the team and started the development of the regression. Then, Hélène Touchais continued the IT developments since November 2022 and has concluded this work at the end of 2023.

Developments are coordinated by Jordi Inglada, CNES & CESBIO-lab. Promotion and training are ensured by Mathieu Fauvel, INRAe & CESBIO-lab and Vincent Thierion, INRAe & CESBIO-lab.

The development was funded by several projects: CNES-PARCELLE, CNES-SWOT Aval, ANR-MAESTRIA and ANR-3IA-ANITI with the support of CESBIO-lab and Theia Data Center. Iota2 has a steering committee which is described here.

We thank the Theia Data Center for making the Sentinel-2 time series available and ready to use.

Footnotes:

1

Need some ground truth.

 

Plus d'actualités

Evolution de l’altitude de la ligne de neige au cours des 41 dernières années dans le bassin versant du Vénéon (Oisans)

Pour contribuer à caractériser les conditions hydrométéorologiques lors de la crue torrentielle qui a frappé la Bérarde en juin, j’ai analysé une nouvelle série de cartes d’enneigement qui couvre la période 1984-2024 [1]. Grâce à la profondeur temporelle de cette série, on constate que l’altitude de la ligne de neige dans le bassin versant du […]

Biophysical parameter retrieval from Sentinel-2 images using physics-driven deep learning for PROSAIL inversion

The results presented here are based on published work: Y. Zérah, S. Valero, and J. Inglada. « Physics-constrained deep learning for biophysical parameter retrieval from sentinel-2 images: Inversion of the prosail model« , in Remote Sensing of Environment, doi: 10.1016/j.rse.2024.114309. This work is part of the PhD of Yoël Zérah, supervised by Jordi Inglada and Silvia Valero. […]

Petition results (to keep S2A operational after S2C launch)

Last week, I issued a petition to keep S2A operational, this post will give you some results. i will keep updating this post regularly, so please go-on signing the petition or forwarding it to your colleagues. Although I have not been trained as a lobbyist, and I have some regrets on how i did it, […]

Rechercher