-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Describe the bug
The dataset openclimatefix/mrms gives a 500 server error when trying to open it on the website, or through code.
The dataset doesn't have a loading script yet, and I did push two xarray Zarr stores of data there recentlyish. The Zarr stores are composed of lots of small files, which I am guessing is probably the problem, as we have another OCF dataset using xarray and Zarr, but with the Zarr stored on GCP public datasets instead of directly in HF datasets, and that one opens fine.
In general, we were hoping to use HF datasets to release some more public geospatial datasets as benchmarks, which are commonly stored as Zarr stores as they can be compressed well and deal with the multi-dimensional data and coordinates fairly easily compared to other formats, but with this error, I'm assuming we should try a different format?
For context, we are trying to have complete public model+data reimplementations of some SOTA weather and solar nowcasting models, like MetNet, MetNet-2, DGMR, and others, which all have large, complex datasets.
Steps to reproduce the bug
from datasets import load_dataset
dataset = load_dataset("openclimatefix/mrms")Expected results
The dataset should be downloaded or open up
Actual results
A 500 internal server error
Environment info
datasetsversion: 1.18.3- Platform: Linux-5.15.25-1-MANJARO-x86_64-with-glibc2.35
- Python version: 3.9.10
- PyArrow version: 7.0.0