A new downloader to access individual files within the zip distributed by THEIA

Downloading time series from Theia can take a while, depending on the area and the time period covered by the images. Also, it consumes a lot of storage locally, since all the archives in .zip format have to be downloaded (the GeoDataHub will probably distribute Cloud Optimized Geotifs). From my experience, a lot of people download an archive, extract the files, keep the files they need (or sometimes, compute what they want from the product, and store only the compressed result), delete the archive, then download the next archive. This approach is not very optimal!

Theia-picker is a small python package enabling to download archives, or individual files from the remote archive. When individual files are downloaded, only the bytes relative to the compressed file in the remote archive are downloaded. Then they are decompressed and written as the file. This is particularly interesting when only a few files are needed. No need to download the entire archive! Only the bytes for the requested files are downloaded. This should improve workflows that download the products archives just to grab 3 or 4 spectral bands…

To ensure that the downloads are correctly performed, theia-picker computes checksums (MD5 for the archives, CRC32 for the extracted individual files). When files checksums don’t match with the expected version, they are downloaded again.

How is it done? Compressed zip archives include information about their contents in a data block at the end of the file called the Central Directory [1]. From this data block, all the compressed files information can be retrieved.

To access this data block, one can use HTTP-range requests. These requests are HTTP GET with an additional header that specify a range of bytes to access {‘range’: ‘bytes=startend‘}. This is enough to retrieve the files information, and also download and decompress them.

At least, this is the theory… In practice, you can still try that with the Theia server: it won’t work very well! I don’t know why exactly, but that is why every single packages for remote zip retrieval fail: the server just closes the connection before sending all the requested bytes. Theia-picker’s workaround consist in always asking all bytes after using byte-range headers {‘range’: ‘bytes=start-‘} (without ‘end’, which still enforces the standard [2]) and closes the connection when the desired length of bytes is received.

Theia-picker is open-source (Licence Apache-2.0) and anyone can open a PR on github. Currenlty, it has not been extensively tested, and feedbacks are welcome. Also, the API is quite minimal (in particular for the search of products) and contributions are welcome!

An example copied from github’s readme.

Rémi Cresson @ INRAE

References:

[1] https://en.wikipedia.org/wiki/ZIP_(file_format)

[2] https://www.rfc-editor.org/rfc/rfc7233#section-2.1

Plus d'actualités

BIOMASS, the third launched satellite mission designed at CESBIO !

After SMOS in 2009, and VENµS in 2017, the CESBIO Laboratory is very proud to see its third proposed mission, Biomass, reach orbit. As always, it has been a long journey from the idea, at the beginning of the century, to the selection in 2013 as the seventh Earth Explorer Mission by ESA, to the […]

Sentinel-2 reveals the surface deformation after the 2025 Myanmar earthquake

Sentinel-2 captured several clear-sky images of Myanmar before and after the 28 March 2025 earthquake. The animation below shows a 5-day apart sequence of images captured by Sentinel-2B and Sentinel-2C (10 m resolution) near the epicenter located close to Mandalay. The surface slip due to the earthquake follows the Sagaing Fault, a major fault in […]

Evolution de l’altitude de la ligne de neige au cours des 41 dernières années dans le bassin versant du Vénéon (Oisans)

Pour contribuer à caractériser les conditions hydrométéorologiques lors de la crue torrentielle qui a frappé la Bérarde en juin, j’ai analysé une nouvelle série de cartes d’enneigement qui couvre la période 1984-2024 [1]. Grâce à la profondeur temporelle de cette série, on constate que l’altitude de la ligne de neige dans le bassin versant du […]

Rechercher