chore/handle numcodecs codecs#3376
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3376 +/- ##
==========================================
- Coverage 60.60% 60.53% -0.08%
==========================================
Files 79 81 +2
Lines 9506 9694 +188
==========================================
+ Hits 5761 5868 +107
- Misses 3745 3826 +81
🚀 New features to boost your workflow:
|
|
|
||
| def _encode(self, chunk_data: Buffer, prototype: BufferPrototype) -> Buffer: | ||
| encoded = self._codec.encode(chunk_data.as_array_like()) | ||
| if isinstance(encoded, np.ndarray): # Required for checksum codecs |
There was a problem hiding this comment.
Can we know statically which are checksum codecs without the isinstance check?
There was a problem hiding this comment.
n.b., this was copy + pasted from numcodecs, but I think the answer is "no"
| codec_name: str | ||
| codec_config: dict[str, JSON] | ||
|
|
||
| def __init_subclass__(cls, *, codec_name: str | None = None, **kwargs: Any) -> None: |
There was a problem hiding this comment.
What would a codec definition look like without this magic? I'd be fine with repeating a few things if it meant we could avoid this (and IIUC some of the complexity in __repr__ and __init__ would go away too?).
There was a problem hiding this comment.
now that I have rebased #3332 off of this PR, I am 100% going to demagic these codecs in that effort.
|
> Can you explain the different cases here? Spinning this question out into the main thread -- from me, the general answer to questions like this will be "no", since I am only copy+pasting stuff from numcodecs. I haven't spent too much time figuring out what this code is doing. I do think @normanrz and @TomNicholas might be able to answer some of these questions though. |
|
Ah, I didn't realize this was mostly from numcodecs. I think that moots most of my comments aside from where in the public API we put these. |
|
yeah I should have made more clear that this is nearly all directly copy + pasted from |
…v-b/zarr-python into chore/handle-numcodecs-codecs
…ore/handle-numcodecs-codecs
|
I think this is ready to go in (and it's necessary for #3332) |
|
and important recent addition: I moved all of the invocations of |
|
@d-v-b have you tested this with different numcodecs versions to make sure there's no unexpected issues with clobbering of codec registration? |
I'm don't think I expect any behavior to depend on numcodecs versions, happy to be corrected though. My understanding is that the codecs in we have a test that checks our compatibility with these codecs, defined as dicts. In main these tests will pick up the codec class from numcodecs, but in this PR the version of the codec defined in zarr python is used instead. Is that kind of thing you are worried about? |
rabernat
left a comment
There was a problem hiding this comment.
Fantastic, great work Davis!
| @dataclass(frozen=True) | ||
| class _NumcodecsCodec(Metadata): | ||
| codec_name: str | ||
| codec_config: dict[str, JSON] | ||
|
|
||
| def __init_subclass__(cls, *, codec_name: str | None = None, **kwargs: Any) -> None: | ||
| """To be used only when creating the actual public-facing codec class.""" | ||
| super().__init_subclass__(**kwargs) | ||
| if codec_name is not None: | ||
| namespace = codec_name | ||
|
|
||
| cls_name = f"{CODEC_PREFIX}{namespace}.{cls.__name__}" | ||
| cls.codec_name = f"{CODEC_PREFIX}{namespace}" | ||
| cls.__doc__ = f""" | ||
| See :class:`{cls_name}` for more details and parameters. | ||
| """ |
There was a problem hiding this comment.
I vaguely remember a discussion a few months back about classes initialized this way having challenges with serialization...but I can't track down the issue.
There was a problem hiding this comment.
those issues would have been resolved by zarr-developers/numcodecs#745, and this PR uses the changes from that PR
| def test_generic_compressor(codec_class: type[_numcodecs._NumcodecsBytesBytesCodec]) -> None: | ||
| data = np.arange(0, 256, dtype="uint16").reshape((16, 16)) | ||
|
|
||
| with pytest.warns(ZarrUserWarning, match=EXPECTED_WARNING_STR): | ||
| a = create_array( | ||
| {}, | ||
| shape=data.shape, | ||
| chunks=(16, 16), | ||
| dtype=data.dtype, | ||
| fill_value=0, | ||
| compressors=[codec_class()], | ||
| ) | ||
|
|
||
| a[:, :] = data.copy() | ||
| np.testing.assert_array_equal(data, a[:, :]) |
This PR brings in all the codecs defined in
numcodecs.zarr3. After this PR is merged, we can safely replace thenumcodecs.zarr3module with reexports from zarr python, or removenumcodecs.zarr3entirely, thereby fixing our circular dependency problem.This PR also changes the default config to ensure that the locally-defined codecs take priority over the same codec found in the numcodecs registry.