account for endian being possibly None#787
Conversation
are you able to share this dataset? If not, what dtype is this for? I'm wondering if it's a single byte dtype where the configuration should be empty. I'm reluctant to accept this change because an endian of None seems to be out-of-spec. |
Not sure, I'll have to find a public version of it.
these are single-byte dtypes (in my case, which I assume is the default for this dtype? |
|
from the v3 spec:
|
|
but that's describing a JSON object, not an in-memory python object. in zarr python, our in-memory representation of the bytes codec uses endian= |
|
not exactly the same dataset, but this also fails with that error: from obstore.store import from_url
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry
from virtualizarr import open_virtual_dataset
url = "https://data-cersat.ifremer.fr/data/sea-surface-temperature/odyssea/l4/glob/nrt/data/v2.1/2021/001/20210101000000-IFR-L4_GHRSST-SSTfnd-ODYSSEA-GLOB_010-v02.1-fv01.0.nc"
store = from_url(url)
registry = ObjectStoreRegistry({url: store})
vds = open_virtual_dataset(
url=url,
loadable_variables=[],
parser=HDFParser(),
registry=registry,
)
display(vds)
vds.vz.to_kerchunk("refs.parquet", format="parquet") |
|
in digging through the zarr code, I noticed that BytesCodec.from_dict({"name": "bytes", "configuration": {}})
BytesCodec.from_dict({"name": "bytes"})both get the default system endian because that's the default for BytesCodec.from_dict({"name": "bytes", "configuration": {"endian": None}})(but that would be against the spec, right?) Also: from zarr.codecs import BytesCodec
codec = BytesCodec(endian=None)
roundtripped = BytesCodec.from_dict(codec.to_dict())
assert codec.endian == roundtripped.endianfails for the same reason. @d-v-b, any idea on what the intended behavior should be? |
Not sure if there's a better fix for this, but the `getattr` call basically guarantees that if `codec` does not have the `endian` attribute the last condition is never reached.
|
the failing RTD build looks unrelated (it fails while creating the env) |
|
We just ran into this and was about to open the same PR, thanks for working on this @keewis. It'll be great if this gets merged and we get a new release since this is a blocker for all earthaccess virtualizarr workflows. cc @maxrjones |
For some reason a dataset I've been working on produces
BytesCodecinstances whereendianisNone, which causes the translation from v3 to v2 metadata to fail.No tests because I didn't check thoroughly on where these should be.
(I've seen that you're planning to migrate to v3 entirely, in which case this won't be necessary anymore. It's such a simple fix, though, that this might still be worth it)
docs/releases.rstapi.rst