Optimize writes to existing Zarr stores.#8875
Merged
dcherian merged 4 commits intopydata:mainfrom Mar 29, 2024
Merged
Conversation
We need to read existing variables to make sure we append or write to a region with the right encoding. Currently we request all arrays in a Zarr group. Instead only request those arrays for which we require encoding information.
9bc43dd to
eb37aed
Compare
dcherian
commented
Mar 25, 2024
| # TODO: consider making loading indexes lazy again? | ||
| existing_vars, _, _ = conventions.decode_cf_variables( | ||
| self.get_variables(), self.get_attrs() | ||
| { |
Contributor
Author
There was a problem hiding this comment.
feels like we should also be skipping this for mode="w" to allow overwriting the existing encoding?
Member
There was a problem hiding this comment.
Can you expand on what you mean? Is that because we would have just written these vars so already know their values/schema?
Contributor
Author
There was a problem hiding this comment.
mode="w" means overwrite so we shoudn't care about encoding on disk, no?
Contributor
Author
There was a problem hiding this comment.
I added a test, the encoding does get update as expected with mode="w", so presumably zarr is nuking the store with mode="w".
max-sixty
approved these changes
Mar 26, 2024
dcherian
added a commit
to dcherian/xarray
that referenced
this pull request
Apr 2, 2024
* main: (26 commits) [pre-commit.ci] pre-commit autoupdate (pydata#8900) Bump the actions group with 1 update (pydata#8896) New empty whatsnew entry (pydata#8899) Update reference to 'Weighted quantile estimators' (pydata#8898) 2024.03.0: Add whats-new (pydata#8891) Add typing to test_groupby.py (pydata#8890) Avoid in-place multiplication of a large value to an array with small integer dtype (pydata#8867) Check for aligned chunks when writing to existing variables (pydata#8459) Add dt.date to plottable types (pydata#8873) Optimize writes to existing Zarr stores. (pydata#8875) Allow multidimensional variable with same name as dim when constructing dataset via coords (pydata#8886) Don't allow overwriting indexes with region writes (pydata#8877) Migrate datatree.py module into xarray.core. (pydata#8789) warn and return bytes undecoded in case of UnicodeDecodeError in h5netcdf-backend (pydata#8874) groupby: Dispatch quantile to flox. (pydata#8720) Opt out of auto creating index variables (pydata#8711) Update docs on view / copies (pydata#8744) Handle .oindex and .vindex for the PandasMultiIndexingAdapter and PandasIndexingAdapter (pydata#8869) numpy 2.0 copy-keyword and trapz vs trapezoid (pydata#8865) upstream-dev CI: Fix interp and cumtrapz (pydata#8861) ...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
We need to read existing variables to make sure we append or write to a region with the right encoding. Currently we decode all arrays in a Zarr group. Instead only decode those arrays for which we require encoding information.