Skip to content

Document minimum zarr#10137

Closed
jrbourbeau wants to merge 4 commits intodask:mainfrom
jrbourbeau:min-zarr
Closed

Document minimum zarr#10137
jrbourbeau wants to merge 4 commits intodask:mainfrom
jrbourbeau:min-zarr

Conversation

@jrbourbeau
Copy link
Member

Offline I saw a user run into the following issue with the latest dask release and an old version of zarr (in this example I was using zarr=2.7.1):

In [1]: import dask.array as da

In [2]: da.to_zarr(da.ones(3), 'tmp.zarr', overwrite=True)

In [3]: da.from_zarr('tmp.zarr')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 da.from_zarr('tmp.zarr')

File ~/projects/dask/dask/dask/array/core.py:3593, in from_zarr(url, component, storage_options, chunks, name, inline_array, **kwargs)
   3591     else:
   3592         store = url
-> 3593     z = zarr.Array(store, read_only=True, path=component, **kwargs)
   3594 else:
   3595     z = zarr.Array(url, read_only=True, path=component, **kwargs)

File ~/mambaforge/envs/dask-py39/lib/python3.9/site-packages/zarr/core.py:158, in Array.__init__(self, store, path, read_only, chunk_store, synchronizer, cache_metadata, cache_attrs, partial_decompress)
    155 self._partial_decompress = partial_decompress
    157 # initialize metadata
--> 158 self._load_metadata()
    160 # initialize attributes
    161 akey = self._key_prefix + attrs_key

File ~/mambaforge/envs/dask-py39/lib/python3.9/site-packages/zarr/core.py:175, in Array._load_metadata(self)
    173 """(Re)load metadata from store."""
    174 if self._synchronizer is None:
--> 175     self._load_metadata_nosync()
    176 else:
    177     mkey = self._key_prefix + array_meta_key

File ~/mambaforge/envs/dask-py39/lib/python3.9/site-packages/zarr/core.py:184, in Array._load_metadata_nosync(self)
    182 try:
    183     mkey = self._key_prefix + array_meta_key
--> 184     meta_bytes = self._store[mkey]
    185 except KeyError:
    186     raise ArrayNotFoundError(self._path)

TypeError: string indices must be integers

This is due to dask not supporting zarr versions that old. I tried a more recent version (zarr=2.12.0 -- released June 23, 2022) and the example worked.

I've documented zarr=2.12.0 as our current minimum version in our install docs and pinned to 2.12.0 in our Python 3.8 build (this is a fine, but low-effort measure -- #10038 would be more complete).

@rabernat @joshmoore do you have any recommendations on a better minimum zarr version?

@github-actions github-actions bot added the documentation Improve or add to documentation label Apr 3, 2023
@jrbourbeau
Copy link
Member Author

Note, the docs failure is unrelated to the changes here and being fixed in #10138

@charlesbluca
Copy link
Member

Digging into this failure a bit more as part of the mindeps testing in #10161, it looks like the underlying issue here is that for zarr<2.12.0, zarr.Array was expecting to receive a zarr.BaseStore as an initializing argument, whereas with >=2.12.0, it was generalized to accept strings and path-like objects representing a store location as well.

Based on the testing in #10161, it seems like Zarr roundtrips work as expected with zarr>=2.4.0, but for <2.12.0 we would require initializing a store before passing off to from_zarr; in your example, this would look like:

import dask.array as da
import zarr

da.to_zarr(da.ones(3), 'tmp.zarr', overwrite=True)
da.from_zarr(zarr.DirectoryStore('tmp.zarr'))

So our options here could either be:

  • Setting the min version to 2.4.0 and modifying tests to use this pattern if we're using zarr<2.12.0
  • Set the min version to 2.12.0

I don't really have a strong preference here, though based on feedback in #10161 seems like we would probably err towards bumping versions? For reference, 2.12.0 is ~9 months old.

@jrbourbeau
Copy link
Member Author

Thanks @charlesbluca. Yeah, I have a preference for just setting 2.12.0 as the minimum. Given the current dask release doesn't work with, for example, zarr=2.4.0, setting the minimum to 2.12.0 seems to just be a better reflection of what's currently supported.

Copy link
Member

@charlesbluca charlesbluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! In that case, how do you feel about reverting the changes to the environment file (since testing should be getting handled in #10161)?

@jrbourbeau jrbourbeau changed the title Document and test minimum zarr Document minimum zarr Apr 10, 2023
@jrbourbeau
Copy link
Member Author

Sounds good -- done

@jrbourbeau
Copy link
Member Author

Superseded by #10161

@jrbourbeau jrbourbeau closed this Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improve or add to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants