Skip to content

500 internal server error when trying to open a dataset composed of Zarr stores #3823

@jacobbieker

Description

@jacobbieker

Describe the bug

The dataset openclimatefix/mrms gives a 500 server error when trying to open it on the website, or through code.

The dataset doesn't have a loading script yet, and I did push two xarray Zarr stores of data there recentlyish. The Zarr stores are composed of lots of small files, which I am guessing is probably the problem, as we have another OCF dataset using xarray and Zarr, but with the Zarr stored on GCP public datasets instead of directly in HF datasets, and that one opens fine.

In general, we were hoping to use HF datasets to release some more public geospatial datasets as benchmarks, which are commonly stored as Zarr stores as they can be compressed well and deal with the multi-dimensional data and coordinates fairly easily compared to other formats, but with this error, I'm assuming we should try a different format?

For context, we are trying to have complete public model+data reimplementations of some SOTA weather and solar nowcasting models, like MetNet, MetNet-2, DGMR, and others, which all have large, complex datasets.

Steps to reproduce the bug

from datasets import load_dataset

dataset = load_dataset("openclimatefix/mrms")

Expected results

The dataset should be downloaded or open up

Actual results

A 500 internal server error

Environment info

  • datasets version: 1.18.3
  • Platform: Linux-5.15.25-1-MANJARO-x86_64-with-glibc2.35
  • Python version: 3.9.10
  • PyArrow version: 7.0.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions