Implement preferred_chunks for netcdf 4 backends#7948
Implement preferred_chunks for netcdf 4 backends#7948Illviljan merged 21 commits intopydata:mainfrom
Conversation
|
Hope this is up to the standards! Please tell me if there is anything I can do to improve this PR. |
|
Is the resulting chunk size a multiple of the on-disk file chunk size or is it exactly the file chunk size? So if I open a file with a file chunk size of 256, do I get a dask array with chunk size 256 or some larger chunk size that is a multiple of 256. By itself, 256 would perform worse than one large chunk for a lot of arrays sizes just from the overhead. |
|
@djhoese if |
@Illviljan @headtr1ck help! |
xarray/tests/test_backends.py
Outdated
| assert all(np.asanyarray(chunksizes) == expected) | ||
|
|
||
| @contextlib.contextmanager | ||
| def create_chunked_file(self, array_shape, chunk_sizes) -> None: |
There was a problem hiding this comment.
| def create_chunked_file(self, array_shape, chunk_sizes) -> None: | |
| def create_chunked_file(self, array_shape: tuple[int, int, int], chunk_sizes: tuple[int, int, int]) -> Generator[str, None, None]: |
I haven't checked it though, but it should bring you in the right direction.
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
|
I attempted to fix the type annotation problem, please tell me if there is more. |
|
What's new now updated |
|
The mypy failures are. related to these changes I think: @Illviljan can you take a look please? |
|
Thanks, @mraspaud ! |
|
My pleasure! thanks for merging! |
According to the
open_datasetdocumentation, usingchunks="auto"orchunks={}should yield datasets with variables chunked depending on the preferred chunks of the backend. However neither the netcdf4 nor the h5netcdf backend seem to implement thepreferred_chunksencoding attribute needed for this to work.This PR adds this attribute to the encoding upon data reading. This results in
chunks="auto"inopen_datasetreturning variables with chunk sizes multiples of the chunks in the nc file, and forchunks={}, returning the variables with then exact nc chunk sizes.whats-new.rstapi.rst