Adding reader_options kwargs to open_virtual_dataset.#67
Conversation
reader_options kwargs to open_virtual_dataset.reader_options kwargs to open_virtual_dataset.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #67 +/- ##
===========================================
- Coverage 90.18% 74.31% -15.87%
===========================================
Files 14 14
Lines 998 946 -52
===========================================
- Hits 900 703 -197
- Misses 98 243 +145
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
|
@norlandrhagen I've merged |
…o allows for reading of cloud storage
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
TomNicholas
left a comment
There was a problem hiding this comment.
Thanks @norlandrhagen ! I think this is a nice minimal addition to support reading from s3. I like how the fsspec stuff is generally kept separate too.
virtualizarr/xarray.py
Outdated
| loadable_variables: Optional[Iterable[str]] = None, | ||
| indexes: Optional[Mapping[str, Index]] = None, | ||
| virtual_array_class=ManifestArray, | ||
| reader_options: Optional[dict] = {'storage_options':{'key':'', 'secret':'', 'anon':True}}, |
There was a problem hiding this comment.
When you normally point xr.open_dataset at an S3 url, you don't need to pass reader_options do you? Can we try to follow the signature of xr.open_dataset as closely as possible? (Maybe this already is as close as we can get)
There was a problem hiding this comment.
I could be wrong, but I thought you had to pass in some sort of fsspec/s3fs mapper.
For me this fails:
ds = xr.open_dataset('s3://carbonplan-share/virtualizarr/local.nc')
|
Looks like the docs + CI builds are failing with: @TomNicholas should we pin another version of Xarray? |
pydata/xarray#8872 was merged yesterday so now we should be able to release xarray, remove the xarray pin in virtualizarr, then release the first version of virtualizarr! EDIT: Tracking the xarray release pydata/xarray#9018 |
reader_options kwargs to open_virtual_dataset.reader_options kwargs to open_virtual_dataset.
|
This seems very close now @norlandrhagen ? |
|
CI is passing now @TomNicholas! |
|
Thanks @norlandrhagen ! One final request: Can we add a quick explanatory line to the docs? Something like
This would go on the usage page, either as a quick entry underneath the first |
|
Thank you so much @norlandrhagen ! Will merge this now |
|
Fantastic. Thanks so much. Ill refactor my code in the coming days. |
In a step to start reading remote files https://github.com/TomNicholas/VirtualiZarr/issues/61, this PR adds in
reader_optionsto the open_virtual_dataset function.These
reader_optionsare passed into each Kerchunk file reader (SingleHdf5ToZarr, NetCDF3ToZarr, etc..) inread_kerchunk_references_from_file. Onceopen_virtual_datasetis replaced with the Xarray backend, we could pass them through:ds = xr.open_dataset(fp, engine='virtualizarr', backend_kwargs={'reader_options': {'storage_options': {'anon': True}}}).This approach relies on the user knowing what options are available in each Kerchunk file reader.
This example should work off of this PR pointing to a public s3 bucket.
edit: Tests are passing. Index generation and filetype guessing are now working.