Skip to content

Zarr LUTs#787

Closed
jammont wants to merge 74 commits into
isofit:devfrom
jammont:luts/zarr
Closed

Zarr LUTs#787
jammont wants to merge 74 commits into
isofit:devfrom
jammont:luts/zarr

Conversation

@jammont

@jammont jammont commented Oct 22, 2025

Copy link
Copy Markdown
Collaborator

Implements the ability to create Zarr stores instead of NetCDF when creating a LUT.

  • The Create class was generalized to be subclassable, introducing:
    • CreateNetCDF current implementation
    • CreateZarr new Zarr-focused version
      • Leverages xarray heavily but we could look into using the zarr package itself, optionally
      • Added a simple pytest case for it
      • By default, the chunking strategy is just along the wavelength dimension -- maybe we want to chunk along points?
  • luts.create is the primary entry point to determine which Create class to return
  • luts.load updated to detect zarr

Some further testing of the efficiency of the xarray implementation is still needed. At the moment, when flush is called, each variable for each point in the queued points is written to disk consecutively. This may be fine because I target the specific region in the Zarr store to write to for each, but this needs to be tested on larger luts.

@jammont jammont added the enhancement New feature or request label Oct 22, 2025
@jammont

jammont commented Oct 24, 2025

Copy link
Copy Markdown
Collaborator Author

There are presently two versions of the zarr implementation:

  • CreateZarrXarray - uses xarray to create and update the zarr except for the attributes which uses the zarr package
  • CreateZarr - uses the zarr package to do everything

The first worked well, the second had some issues with fill values. Zarr by default sets the fill value of an array to 0, whereas xarray sets it to null. In zarr's case, it was causing our zero values to be replaced with NaNs when loaded with xarray. Simply adding fill_value=None to the z.array call fixed it

@jammont jammont marked this pull request as ready for review November 10, 2025 20:15
@jammont

jammont commented Nov 21, 2025

Copy link
Copy Markdown
Collaborator Author

Some performance results:

Top graph is zarr, bottom is netcdf.
image

Larger 5GB LUT generation:
image
Zarr is about 1 minute longer, but mem and cpu performance otherwise very similar. Additional minute seems like it comes from the time between the sims ending and the reloading of the LUT. This seems to source from being slightly slower to load from disk compared to NetCDF.

@jammont jammont mentioned this pull request May 12, 2026
@jammont

jammont commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator Author

Closing as this has been integrated separately in ISOFIT v4.0.0

@jammont jammont closed this Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant