Parse zarr v2 by neilSchroeder · Pull Request #822 · zarr-developers/VirtualiZarr

neilSchroeder · 2025-10-31T19:43:38Z

Checklist

Closes Virtualize Native Zarr V2 format #565
Tests added
Tests passing
Full type hint coverage
Changes are documented in docs/releases.md
New functionality has documentation

How `zarr.py` Handles Zarr V2 Stores

ZarrParser should now support both Zarr V2 and V3 stores by normalizing V2 stores to appear as V3. This approach ensures that all parsers produce V3-compatible outputs, and confines modifications to zarr.py.

V2 → V3 Normalization Strategy

The parser performs a two-part normalization:

1. Chunk Key Mapping (`get_chunk_mapping_prefix`)

For V2 arrays:

Chunk files are stored directly under the array path: array_name/0, array_name/0.1.2
Metadata files (.zarray, .zattrs, etc.) are filtered out
Chunk coordinates are normalized to dot-separated format: "0.1.2"
File paths in the manifest point to the actual V2 chunk locations
Manifest keys contain only chunk coordinates (no path structure)

2. Metadata Conversion (`get_metadata()`)

After converting V2 metadata to V3 using _convert_array_metadata, we have to replace the chunk_key_encoding.

The automatic converter preserves V2ChunkKeyEncoding in the V3 metadata
When zarr/xarray sees V2ChunkKeyEncoding, it requests chunks using V2-style paths: array/0
With DefaultChunkKeyEncoding, zarr requests chunks using V3-style paths: array/c/0
ManifestStore.get() expects V3-style paths and uses parse_manifest_index() to extract chunk coordinates
parse_manifest_index() requires the /c/ component to correctly parse the path

Additional metadata handling

None fill values: Converted to appropriate dtype defaults
Dimension names: Extracted from _ARRAY_DIMENSIONS attribute or generated as {array_name}_dim_{i}
All other metadata: Converted using zarr's standard V2→V3 migration utilities

Implementation Notes

I'm not convinced I've done a particularly elegant implementation here, but adding another class for V2 parsing didn't seem like it would be particularly extensible. Very happy to hear thoughts on perhaps a better implementation.

@TomNicholas thank you very much for your feedback, it definitely helped me wrap my head around the right approach to take here.

Edit: I've done a bit of re-design to use a strategy pattern for dispatching to parsing v2 and v3 arrays. This should make future integrations of zarr array version parsing a lot more maintainable. This is also just a lot easier to read than my original implementation. Tests and documentation are also up to date.

…rror handling

… V2 data

… V2 and V3 formats

for more information, see https://pre-commit.ci

virtualizarr/manifests/store.py

neilSchroeder · 2025-10-31T22:59:52Z

How `zarr.py` Handles Zarr V2 Stores

ZarrParser should now support both Zarr V2 and V3 stores by normalizing V2 stores to appear as V3. This approach ensures that all parsers produce V3-compatible outputs, and confines modifications to zarr.py.

V2 → V3 Normalization Strategy

The parser performs a two-part normalization:

1. Chunk Key Mapping (`get_chunk_mapping_prefix`)

For V2 arrays:

Chunk files are stored directly under the array path: array_name/0, array_name/0.1.2
Metadata files (.zarray, .zattrs, etc.) are filtered out
Chunk coordinates are normalized to dot-separated format: "0.1.2"
File paths in the manifest point to the actual V2 chunk locations
Manifest keys contain only chunk coordinates (no path structure)

2. Metadata Conversion (`get_metadata()`)

After converting V2 metadata to V3 using _convert_array_metadata, we have to replace the chunk_key_encoding.

The automatic converter preserves V2ChunkKeyEncoding in the V3 metadata
When zarr/xarray sees V2ChunkKeyEncoding, it requests chunks using V2-style paths: array/0
With DefaultChunkKeyEncoding, zarr requests chunks using V3-style paths: array/c/0
ManifestStore.get() expects V3-style paths and uses parse_manifest_index() to extract chunk coordinates
parse_manifest_index() requires the /c/ component to correctly parse the path

Additional metadata handling

None fill values: Converted to appropriate dtype defaults
Dimension names: Extracted from _ARRAY_DIMENSIONS attribute or generated as {array_name}_dim_{i}
All other metadata: Converted using zarr's standard V2→V3 migration utilities

Implementation Notes

I'm not convinced I've done a particularly elegant implementation here, but adding another class for V2 parsing didn't seem like it would be particularly extensible. Very happy to hear thoughts on perhaps a better implementation.

@TomNicholas thank you very much for your feedback up there, definitely helped me wrap my head around the right approach to take here.

Edit: I've done a bit of re-design to use a strategy pattern for dispatching to parsing v2 and v3 arrays. This should make future integrations of zarr array version parsing a lot more maintainable. This is also just a lot easier to read than my original implementation. Tests and such are also up to date. Also going to move this into the PR notes instead of huge comment here.

…aintainability, linted

…inted

TomNicholas · 2025-11-03T22:19:49Z

Let me know when you would like a review of this @neilSchroeder !

neilSchroeder · 2025-11-03T22:23:11Z

@TomNicholas I think it's ready for a review.

TomNicholas

Thanks for working on this @neilSchroeder ! I mostly have a bunch of small gripes 😁

virtualizarr/parsers/zarr.py

virtualizarr/tests/test_parsers/test_zarr.py

virtualizarr/parsers/zarr.py

…ibutes

codecov · 2025-11-06T22:01:12Z

Codecov Report

❌ Patch coverage is 99.13793% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 88.31%. Comparing base (cb2912e) to head (a4a271f).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
virtualizarr/parsers/zarr.py	99.13%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #822      +/-   ##
==========================================
+ Coverage   87.71%   88.31%   +0.60%     
==========================================
  Files          35       35              
  Lines        1880     1968      +88     
==========================================
+ Hits         1649     1738      +89     
+ Misses        231      230       -1

Files with missing lines	Coverage Δ
virtualizarr/parsers/zarr.py	`99.33% <99.13%> (+2.55%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

neilSchroeder · 2025-11-10T14:52:07Z

@TomNicholas I'm ready for another review whenever you've got time

TomNicholas

This looks great, thank you so much @neilSchroeder !

TomNicholas · 2025-11-10T15:37:56Z

virtualizarr/parsers/zarr.py

+    strategy = get_strategy(zarr_array)
+    chunk_map = await strategy.get_chunk_mapping(zarr_array, path)
+
+    if not chunk_map:


Actually ignore me, I think what you're done here is good.

I'm a little unclear about order of operations and whether or not these scenarios are realistic.

These scenarios are definitely plausible.

Or maybe handled differently?

There might be a way to refactor to have a few fewer levels of functions, but this is good.

neilSchroeder · 2025-11-11T20:23:02Z

@TomNicholas what are the next steps here? Will this be merged whenever someone has time to do the next release? Do we need another reviewer?

neilSchroeder added 8 commits October 31, 2025 13:54

Implement Zarr V2 to V3 metadata conversion with fill value handling

7027e0d

Enhance parse_manifest_index to support V2 and V3 chunk key parsing

0cb9ab3

Refactor parse_manifest_index to improve regex pattern matching and e…

f43a822

…rror handling

Enhance get_chunk_mapping_prefix to support V2 and V3 chunk path parsing

a5dec74

Enhance build_chunk_manifest to calculate chunk grid shape for inline…

5ebb020

… V2 data

Enhance test_virtual_dataset_zarr to handle dimension name checks for…

08f7dd3

… V2 and V3 formats

Remove redundant check for V2 format in get_metadata function

126db99

cleaning up

bbb5980

neilSchroeder temporarily deployed to test-release October 31, 2025 19:44 — with GitHub Actions Inactive

[pre-commit.ci] auto fixes from pre-commit.com hooks

e4f3019

for more information, see https://pre-commit.ci

pre-commit-ci bot temporarily deployed to test-release October 31, 2025 19:45 Inactive

TomNicholas reviewed Oct 31, 2025

View reviewed changes

virtualizarr/manifests/store.py Show resolved Hide resolved

neilSchroeder added 3 commits October 31, 2025 15:38

revert store

8d8386b

linting

ce42f47

merge and lint

a7baf03

neilSchroeder temporarily deployed to test-release October 31, 2025 22:21 — with GitHub Actions Inactive

fixing mypy typing

f27c866

neilSchroeder temporarily deployed to test-release October 31, 2025 22:28 — with GitHub Actions Inactive

removing redundant code, linting

f7c8434

neilSchroeder temporarily deployed to test-release October 31, 2025 23:26 — with GitHub Actions Inactive

neilSchroeder added 2 commits November 3, 2025 09:21

refactor zarr parsing to use strategy pattern for extensibility and m…

3d2d705

…aintainability, linted

refactor test, add tests to improve coverage of zarr parsing (97%), l…

aa9bbe0

…inted

neilSchroeder temporarily deployed to test-release November 3, 2025 16:25 — with GitHub Actions Inactive

neilSchroeder added 2 commits November 3, 2025 14:35

adding v2 parsing as new feature

e6cabaf

updating ZarrParser documentation

90c621f

neilSchroeder temporarily deployed to test-release November 3, 2025 22:18 — with GitHub Actions Inactive

neilSchroeder marked this pull request as ready for review November 3, 2025 22:22

TomNicholas requested changes Nov 4, 2025

View reviewed changes

neilSchroeder added 2 commits November 5, 2025 10:39

converting protocol to ABC

96bd1d4

adding tests for sparse files being filled with default fill values

3e39f12

neilSchroeder temporarily deployed to test-release November 6, 2025 19:24 — with GitHub Actions Inactive

fix zeros list

49cb3a4

neilSchroeder temporarily deployed to test-release November 6, 2025 19:24 — with GitHub Actions Inactive

neilSchroeder added 3 commits November 6, 2025 12:29

adding comment about chunk key discovery

a16f595

cleaning up a bit based on comments

8103aab

fixing issue with conflicting test assertions around v2 metadata attr…

a4a271f

…ibutes

neilSchroeder temporarily deployed to test-release November 6, 2025 21:58 — with GitHub Actions Inactive

refactoring common bits of code

3b9ce88

neilSchroeder temporarily deployed to test-release November 6, 2025 22:04 — with GitHub Actions Inactive

neilSchroeder added 2 commits November 6, 2025 15:11

raise error on shard detection for v3

3348604

test that sharded v3 array raises error

8745e9f

neilSchroeder temporarily deployed to test-release November 6, 2025 22:12 — with GitHub Actions Inactive

fixing mypy errors

75f439e

neilSchroeder temporarily deployed to test-release November 6, 2025 22:19 — with GitHub Actions Inactive

neilSchroeder requested a review from TomNicholas November 6, 2025 23:27

TomNicholas approved these changes Nov 10, 2025

View reviewed changes

TomNicholas merged commit acb0bb6 into zarr-developers:main Nov 11, 2025
13 checks passed

neilSchroeder deleted the parse-zarr-v2 branch November 11, 2025 23:03

This was referenced Nov 11, 2025

Raise informative error on Zarr V2 parsing with Zarr-Python<3.1.3 #829

Merged

Parse zarr v2 #806

Closed

Conversation

neilSchroeder commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

How zarr.py Handles Zarr V2 Stores

V2 → V3 Normalization Strategy

1. Chunk Key Mapping (get_chunk_mapping_prefix)

2. Metadata Conversion (get_metadata())

Additional metadata handling

Implementation Notes

Uh oh!

Uh oh!

neilSchroeder commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How zarr.py Handles Zarr V2 Stores

V2 → V3 Normalization Strategy

1. Chunk Key Mapping (get_chunk_mapping_prefix)

2. Metadata Conversion (get_metadata())

Additional metadata handling

Implementation Notes

Uh oh!

TomNicholas commented Nov 3, 2025

Uh oh!

neilSchroeder commented Nov 3, 2025

Uh oh!

TomNicholas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

neilSchroeder commented Nov 10, 2025

Uh oh!

TomNicholas left a comment

Choose a reason for hiding this comment

Uh oh!

TomNicholas Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

neilSchroeder commented Nov 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

neilSchroeder commented Oct 31, 2025 •

edited

Loading

How `zarr.py` Handles Zarr V2 Stores

1. Chunk Key Mapping (`get_chunk_mapping_prefix`)

2. Metadata Conversion (`get_metadata()`)

neilSchroeder commented Oct 31, 2025 •

edited

Loading

How `zarr.py` Handles Zarr V2 Stores

1. Chunk Key Mapping (`get_chunk_mapping_prefix`)

2. Metadata Conversion (`get_metadata()`)

codecov bot commented Nov 6, 2025 •

edited

Loading