chore/handle numcodecs codecs by d-v-b · Pull Request #3376 · zarr-developers/zarr-python

d-v-b · 2025-08-13T19:38:46Z

This PR brings in all the codecs defined in numcodecs.zarr3. After this PR is merged, we can safely replace the numcodecs.zarr3 module with reexports from zarr python, or remove numcodecs.zarr3 entirely, thereby fixing our circular dependency problem.

This PR also changes the default config to ensure that the locally-defined codecs take priority over the same codec found in the numcodecs registry.

codecov · 2025-08-13T20:13:52Z

Codecov Report

❌ Patch coverage is 31.73077% with 142 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.53%. Comparing base (e76b1e0) to head (3732501).
⚠️ Report is 65 commits behind head on main.

Files with missing lines	Patch %	Lines
src/zarr/codecs/numcodecs/_codecs.py	38.46%	104 Missing ⚠️
src/zarr/codecs/__init__.py	0.00%	33 Missing ⚠️
src/zarr/codecs/numcodecs/__init__.py	0.00%	3 Missing ⚠️
src/zarr/codecs/sharding.py	0.00%	1 Missing ⚠️
src/zarr/registry.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3376      +/-   ##
==========================================
- Coverage   60.60%   60.53%   -0.08%     
==========================================
  Files          79       81       +2     
  Lines        9506     9694     +188     
==========================================
+ Hits         5761     5868     +107     
- Misses       3745     3826      +81

Files with missing lines	Coverage Δ
src/zarr/codecs/blosc.py	`39.39% <ø> (+0.78%)`	⬆️
src/zarr/codecs/bytes.py	`54.09% <ø> (+2.53%)`	⬆️
src/zarr/codecs/crc32c_.py	`43.75% <ø> (+2.57%)`	⬆️
src/zarr/codecs/gzip.py	`30.30% <ø> (+1.73%)`	⬆️
src/zarr/codecs/transpose.py	`47.27% <ø> (+1.65%)`	⬆️
src/zarr/codecs/vlen_utf8.py	`28.07% <ø> (+1.40%)`	⬆️
src/zarr/codecs/zstd.py	`36.00% <ø> (+1.38%)`	⬆️
src/zarr/core/config.py	`29.16% <ø> (+4.16%)`	⬆️
src/zarr/codecs/sharding.py	`59.07% <0.00%> (+0.18%)`	⬆️
src/zarr/registry.py	`63.63% <50.00%> (ø)`
... and 3 more

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

docs/user-guide/config.rst

TomAugspurger · 2025-08-17T18:02:03Z

src/zarr/codecs/_numcodecs.py

+
+    def _encode(self, chunk_data: Buffer, prototype: BufferPrototype) -> Buffer:
+        encoded = self._codec.encode(chunk_data.as_array_like())
+        if isinstance(encoded, np.ndarray):  # Required for checksum codecs


Can we know statically which are checksum codecs without the isinstance check?

n.b., this was copy + pasted from numcodecs, but I think the answer is "no"

TomAugspurger · 2025-08-17T18:05:20Z

src/zarr/codecs/_numcodecs.py

+    codec_name: str
+    codec_config: dict[str, JSON]
+
+    def __init_subclass__(cls, *, codec_name: str | None = None, **kwargs: Any) -> None:


What would a codec definition look like without this magic? I'd be fine with repeating a few things if it meant we could avoid this (and IIUC some of the complexity in __repr__ and __init__ would go away too?).

now that I have rebased #3332 off of this PR, I am 100% going to demagic these codecs in that effort.

src/zarr/codecs/_numcodecs.py

tests/test_codecs/test_numcodecs.py

d-v-b · 2025-08-17T18:23:52Z

> Can you explain the different cases here?

Spinning this question out into the main thread -- from me, the general answer to questions like this will be "no", since I am only copy+pasting stuff from numcodecs. I haven't spent too much time figuring out what this code is doing. I do think @normanrz and @TomNicholas might be able to answer some of these questions though.

TomAugspurger · 2025-08-17T18:59:22Z

Ah, I didn't realize this was mostly from numcodecs. I think that moots most of my comments aside from where in the public API we put these.

d-v-b · 2025-08-17T19:01:22Z

yeah I should have made more clear that this is nearly all directly copy + pasted from numcodecs.zarr3

…v-b/zarr-python into chore/handle-numcodecs-codecs

…ore/handle-numcodecs-codecs

d-v-b · 2025-08-21T10:23:59Z

I think this is ready to go in (and it's necessary for #3332)

d-v-b · 2025-08-21T11:07:54Z

and important recent addition: I moved all of the invocations of register_codec to src/zarr/codecs/__init__.py. This ensures that all codecs get registered, regardless of whether they are part of zarr.codecs.__all__

maxrjones · 2025-08-21T15:47:26Z

@d-v-b have you tested this with different numcodecs versions to make sure there's no unexpected issues with clobbering of codec registration?

d-v-b · 2025-08-21T16:07:48Z

@d-v-b have you tested this with different numcodecs versions to make sure there's no unexpected issues with clobbering of codec registration?

I'm don't think I expect any behavior to depend on numcodecs versions, happy to be corrected though. My understanding is that the codecs in numcodecs.zarr3 are not registered with the numcodecs registry, and instead are exposed via the entrypoints framework. That means we don't have to worry about anything interacting with numcodecs' own registry.

we have a test that checks our compatibility with these codecs, defined as dicts. In main these tests will pick up the codec class from numcodecs, but in this PR the version of the codec defined in zarr python is used instead. Is that kind of thing you are worried about?

rabernat

Fantastic, great work Davis!

rabernat · 2025-09-05T14:41:07Z

src/zarr/codecs/numcodecs/_codecs.py

+@dataclass(frozen=True)
+class _NumcodecsCodec(Metadata):
+    codec_name: str
+    codec_config: dict[str, JSON]
+
+    def __init_subclass__(cls, *, codec_name: str | None = None, **kwargs: Any) -> None:
+        """To be used only when creating the actual public-facing codec class."""
+        super().__init_subclass__(**kwargs)
+        if codec_name is not None:
+            namespace = codec_name
+
+            cls_name = f"{CODEC_PREFIX}{namespace}.{cls.__name__}"
+            cls.codec_name = f"{CODEC_PREFIX}{namespace}"
+            cls.__doc__ = f"""
+            See :class:`{cls_name}` for more details and parameters.
+            """


I vaguely remember a discussion a few months back about classes initialized this way having challenges with serialization...but I can't track down the issue.

those issues would have been resolved by zarr-developers/numcodecs#745, and this PR uses the changes from that PR

rabernat · 2025-09-05T14:45:20Z

tests/test_codecs/test_numcodecs.py

+def test_generic_compressor(codec_class: type[_numcodecs._NumcodecsBytesBytesCodec]) -> None:
+    data = np.arange(0, 256, dtype="uint16").reshape((16, 16))
+
+    with pytest.warns(ZarrUserWarning, match=EXPECTED_WARNING_STR):
+        a = create_array(
+            {},
+            shape=data.shape,
+            chunks=(16, 16),
+            dtype=data.dtype,
+            fill_value=0,
+            compressors=[codec_class()],
+        )
+
+    a[:, :] = data.copy()
+    np.testing.assert_array_equal(data, a[:, :])


Love this test.

d-v-b added 3 commits August 13, 2025 18:38

bring in contents of numcodecs.zarr3

0183eb5

fix tests

5f06c9e

fix docs

e64830f

github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Aug 13, 2025

fix config test

b6b2260

This was referenced Aug 13, 2025

chore/hollow out zarr3 zarr-developers/numcodecs#780

Merged

remove the zarr dependency zarr-developers/numcodecs#778

Closed

d-v-b requested a review from a team August 13, 2025 20:38

TomAugspurger reviewed Aug 17, 2025

View reviewed changes

Merge branch 'main' into chore/handle-numcodecs-codecs

92e9442

d-v-b added 8 commits August 18, 2025 21:28

Merge branch 'main' into chore/handle-numcodecs-codecs

233fbbf

make zarr.codecs.numcodecs

223ae65

Merge branch 'main' into chore/handle-numcodecs-codecs

8645370

complete move to zarr.codecs.numcodecs

1720a75

Merge branch 'chore/handle-numcodecs-codecs' of https://github.com/d-…

fe39bfb

…v-b/zarr-python into chore/handle-numcodecs-codecs

Merge branch 'main' into chore/handle-numcodecs-codecs

0e10d8e

changelog

d5e0461

Merge branch 'main' of github.com:zarr-developers/zarr-python into ch…

e3e1216

…ore/handle-numcodecs-codecs

github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Aug 21, 2025

d-v-b added 2 commits August 21, 2025 13:03

register codecs in codecs/__init__.py

0a724e6

remove old registration sites

f023487

Merge branch 'main' into chore/handle-numcodecs-codecs

7fd8901

d-v-b mentioned this pull request Aug 24, 2025

add a runtime type checker for metadata objects #3400

Open

d-v-b added 2 commits August 24, 2025 15:23

Merge branch 'main' into chore/handle-numcodecs-codecs

efd6828

Merge branch 'main' into chore/handle-numcodecs-codecs

7af4bcb

rabernat approved these changes Sep 5, 2025

View reviewed changes

Merge branch 'main' into chore/handle-numcodecs-codecs

3732501

d-v-b enabled auto-merge (squash) September 13, 2025 16:35

d-v-b merged commit bce30dd into zarr-developers:main Sep 13, 2025
29 checks passed

d-v-b deleted the chore/handle-numcodecs-codecs branch September 13, 2025 18:43

d-v-b mentioned this pull request Sep 16, 2025

Release Zarr-Python v3.1.3 #3462

Closed

26 tasks

maxrjones mentioned this pull request Nov 3, 2025

Imagecodecs support NASA-IMPACT/veda-odd#214

Closed

Uh oh!

Conversation

d-v-b commented Aug 13, 2025

Uh oh!

codecov bot commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

TomAugspurger Aug 17, 2025

Choose a reason for hiding this comment

Uh oh!

d-v-b Aug 17, 2025

Choose a reason for hiding this comment

Uh oh!

TomAugspurger Aug 17, 2025

Choose a reason for hiding this comment

Uh oh!

d-v-b Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

d-v-b commented Aug 17, 2025

Uh oh!

TomAugspurger commented Aug 17, 2025

Uh oh!

d-v-b commented Aug 17, 2025

Uh oh!

d-v-b commented Aug 21, 2025

Uh oh!

d-v-b commented Aug 21, 2025

Uh oh!

maxrjones commented Aug 21, 2025

Uh oh!

d-v-b commented Aug 21, 2025

Uh oh!

rabernat left a comment

Choose a reason for hiding this comment

Uh oh!

rabernat Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

d-v-b Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

rabernat Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Aug 13, 2025 •

edited

Loading