Skip to content

Rasterio / GDALOpenEx performance slowdown in loops #11164

@EmilRybergAkson

Description

@EmilRybergAkson

What is the bug?

Using rasterio 1.3.11 compiled against GDAL 3.9.1 and later results in slowdown when creating datasets in a loop. It does not seem to be an issue in 3.9.0 and earlier. I have tried debugging to figure out what the problem is, but I have not been able to pinpoint it exactly, and what i have found has confused me even more...

The problem is line 317 in rasterio _io.pyx in the _delete_dataset_if_exists which is called when a dataset is opened in "w" mode. The call to open_dataset is what slows down, specifically GDALOpenEx in _base.pyx line 219.

In some mock code i wrote, calling GDALOpenEx in both C++ and Python in a loop, the problem does not seem to occur. As far as I can tell there have also not been any changes to GDALOpenEx between 3.9.0 and 3.9.1, but could there be some error handling or other overhead that rasterio uses that is the root problem?

Steps to reproduce the issue

Running the following test code should show the problem:

import rasterio
import numpy as np
import os
import time
import shutil

dest_np = np.random.rand(1000, 1000)
crs = "EPSG:32633"
dest_transform = rasterio.transform.from_origin(0, 0, 1, 1)

save_dir = "test_tiffs"
if os.path.exists(save_dir):
    shutil.rmtree(save_dir)

os.makedirs(save_dir, exist_ok=True)
with rasterio.Env():
    for i in range(1000):
        img_save_path = f"{save_dir}/test_{i}.tif"
        start_time = time.time()
        with rasterio.open(img_save_path, mode="w", driver="GTiff", compress="LZW", dtype=rasterio.float32,
                           crs=rasterio.crs.CRS.from_string(crs), blockysize=256, transform=dest_transform,
                           width=dest_np.shape[1], height=dest_np.shape[0],
                           count=1) as dataset_out:
            dataset_out.write(dest_np, 1)
        end_time = time.time()
        print(f"Time to save: {end_time - start_time}s")

With versions including and after GDAL 3.9.1 it should gradually become slower each iteration. This is not the case with GDAL 3.9.0 and earlier.

Sample from my PC with GDAL 3.9.1:

Time to save: 0.05449843406677246s
Time to save: 0.04797816276550293s
Time to save: 0.04688286781311035s
Time to save: 0.0468904972076416s
Time to save: 0.04593157768249512s
Time to save: 0.04598069190979004s
Time to save: 0.04586148262023926s
Time to save: 0.04588150978088379s
Time to save: 0.04582929611206055s
Time to save: 0.045999765396118164s
Time to save: 0.04484057426452637s
Time to save: 0.04634523391723633s
Time to save: 0.04588055610656738s
Time to save: 0.04595613479614258s
Time to save: 0.04711031913757324s
Time to save: 0.04685401916503906s
Time to save: 0.046930789947509766s
Time to save: 0.04686570167541504s
.....
Time to save: 0.05614948272705078s
Time to save: 0.0571291446685791s
Time to save: 0.05603289604187012s
Time to save: 0.05613136291503906s
Time to save: 0.0571291446685791s
Time to save: 0.058188438415527344s
Time to save: 0.05613589286804199s
Time to save: 0.05608987808227539s
Time to save: 0.056015968322753906s
Time to save: 0.0596466064453125s
Time to save: 0.06095004081726074s
Time to save: 0.05870771408081055s
Time to save: 0.05717587471008301s
Time to save: 0.057085514068603516s
Time to save: 0.057097434997558594s
Time to save: 0.05931401252746582s
......

Versions and provenance

Python 3.12, but problem is also present in other Python versions (tested on 3.9 also)
Windows 11. Rasterio version 1.3.11. Tested GDAL versions:

3.7.3 - Works
3.8.5 - Works
3.9.0 - Works
3.9.1 - Slowdown
3.9.3 - Slowdown

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions