Skip to content

gdal mdim concat: support globs and more than one target array#13228

Merged
rouault merged 10 commits intoOSGeo:masterfrom
rouault:fix_13219
Oct 17, 2025
Merged

gdal mdim concat: support globs and more than one target array#13228
rouault merged 10 commits intoOSGeo:masterfrom
rouault:fix_13219

Conversation

@rouault
Copy link
Copy Markdown
Member

@rouault rouault commented Oct 16, 2025

Fixes #13218
Fixes #13219

CC @dbaston @mdsumner

@rouault rouault added this to the 3.12.0 milestone Oct 16, 2025
@rouault rouault added enhancement funded through GSP Work funded through the GDAL Sponsorship Program gdal_cli Anything related to the new 3.11 "gdal" CLI frontend labels Oct 16, 2025
@rouault rouault force-pushed the fix_13219 branch 2 times, most recently from ee92660 to c9653b0 Compare October 16, 2025 16:11
@dbaston
Copy link
Copy Markdown
Member

dbaston commented Oct 16, 2025

Wildcard inputs appear to work only if input is a positional argument. With a keyword argument I get

$ gdal mdim mosaic --input ~/data/dtm/*.nc --output /tmp/out.vrt
ERROR 1: mosaic: Positional values starting at '/home/dan/data/dtm/DTM_1as_N23E169.nc' are not expected.
Usage: gdal mdim mosaic [OPTIONS] <INPUT>... <OUTPUT>
Try 'gdal mdim mosaic --help' for help.

I don't see this issue with gdal raster mosaic.

@rouault
Copy link
Copy Markdown
Member Author

rouault commented Oct 16, 2025

I don't see this issue with gdal raster mosaic.

are you sure about that? I get the same behavior:

$ gdal raster mosaic --input byte*.tif --output out.vrt --overwrite
ERROR 1: mosaic: Positional values starting at 'byte_zstd.tif' are not expected.
Usage: gdal raster mosaic [OPTIONS] <INPUT>... <OUTPUT>
Try 'gdal raster mosaic --help' for help.

$ gdal mdim mosaic --input oisst-avhrr-v02r01.1981* --output out.vrt --overwrite
ERROR 1: mosaic: Positional values starting at '/home/even/gdal/gdal/build_cmake/oisst-avhrr-v02r01.19810910.nc' are not expected.
Usage: gdal mdim mosaic [OPTIONS] <INPUT>... <OUTPUT>
Try 'gdal mdim mosaic --help' for help.

I'm not sure how that could be fixed because the wildcard expansion is done here by the shell (here Bash), and not by GDAL. So GDAL receives something like "gdal xxx mosaic --input file1 file2 --output out", which is not consistent with what it expects

A way of avoiding Bash expension is to put the pattern between single quotes:

gdal mdim mosaic --input 'oisst-avhrr-v02r01.1981*' --output out.vrt --overwrite

which uncovered a bug fixed per 5dee635

@dbaston
Copy link
Copy Markdown
Member

dbaston commented Oct 16, 2025

I compared this with cdo collgrid, reading 4 netCDF files with 5 variables over a regular 3600x3600 lat/lon grid. Here are my timings:

$ time gdal mdim mosaic  ~/data/dtm/*.nc  /tmp/out.nc --overwrite
0...10...20...30...40...50...60...70...80...90...100 - done in 00:41:37.                 

real	41m36.622s
user	39m59.545s
sys	0m58.796s

vs

$ time cdo -z zip_4 collgrid ~/data/dtm/*.nc /tmp/out_cdo.nc
cdo    collgrid: Processed 259200000 values from 20 variables over 4 timesteps [17.33s 2597MB]

real	0m17.428s
user	0m16.407s
sys	0m1.006s

(The message from cdo about 4 timesteps is incorrect, there is no time dimension.)

I poked at gdal mdim mosaic a couple of times with gdb to see what it was doing, and it was always uncompressing netCDF data. The chunk sizes of the netCDF are large (3600x3600 or 1800x1800 depending on the variable) and I'm guessing that gdal mdim mosaic is doing many small reads and blowing out the block cache in a way that repeatedly causes the same pixels to be uncompressed. These netCDF files are written with the netCDF4 Python library using default parameters so, while I would have chosen a different chunk size, I don't think they're atypical.

I also notice that gdal mdim mosaic reordered the variables in the output to alphabetic order, whereas cdo preserved the ordering of the inputs.

@dbaston
Copy link
Copy Markdown
Member

dbaston commented Oct 16, 2025

are you sure about that?

I think so...

$ gdal raster mosaic --input ~/data/FABDEM/NE/*.tif /tmp/out.vrt --resolution average --overwrite
$ cat /tmp/out.vrt | xq -x ".//SourceFilename"
/home/dan/data/FABDEM/NE/N59E006_FABDEM_V1-2.tif
/home/dan/data/FABDEM/NE/N74E019_FABDEM_V1-2.tif
/home/dan/data/FABDEM/NE/N78E026_FABDEM_V1-2.tif
/home/dan/data/FABDEM/NE/N79E025_FABDEM_V1-2.tif
/home/dan/data/FABDEM/NE/N79E026_FABDEM_V1-2.tif

EDIT: I see now that adding --output causes it to fail, but --input is OK.

@dbaston
Copy link
Copy Markdown
Member

dbaston commented Oct 16, 2025

After rechunking my inputs to 256x256, gdal mdim mosaic takes 3 seconds rather than 41 minutes 😳

@rouault
Copy link
Copy Markdown
Member Author

rouault commented Oct 16, 2025

After rechunking my inputs to 256x256, gdal mdim mosaic takes 3 seconds rather than 41 minutes 😳

you might want to try adding "--co ARRAY:IF(DIM=2):BLOCKSIZE=x,y" so that the block size of the target array (assuming it is 2D, otherwise adapt) is consistent with your input dataset. It feels like the program should try to do that automatically if the user doesn't

@mdsumner
Copy link
Copy Markdown
Contributor

mdsumner commented Oct 16, 2025

After rechunking my inputs to 256x256, gdal mdim mosaic takes 3 seconds rather than 41 minutes 😳

I had only so far thought of this in terms of VRT. This is already a rechunking engine 🤯.

There's so many benefits to GDAL as a foundation, it already delivers such solid results in so many ways. (Another I've noticed is very polite net-citizenry, pooling of file connections - I gather - and minimal requests on Thredds. xarray regularly times out for requests that need dozens of chunks from one file, but GDAL never has. Outside of GDAL and xarray, I tested raw async byte range requests for the same chunk, at 10x the Thredds server delivers, at 20x it's a 104, which matches the xarray fails, compared to other dispersed-across-files patterns where xarray-read succeeds) 💯👌

@rouault
Copy link
Copy Markdown
Member Author

rouault commented Oct 16, 2025

Both array order and source array block size preservation now implemented

@rouault rouault merged commit ccebf43 into OSGeo:master Oct 17, 2025
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement funded through GSP Work funded through the GDAL Sponsorship Program gdal_cli Anything related to the new 3.11 "gdal" CLI frontend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gdal mdim array: support globs in input filenames gdal mdim mosaic: support multiple input arrays

3 participants