Skip to content

Conversation

@vinisalazar
Copy link
Contributor

Hi,

this is a follow-up to the discussion in #252. I got the test to work with the NERACOOS server. But there's a catch: the OPeNDAP request seems to fail after running the griddap_initialize method. My impression is that happens because that when that method is called, it adds the dataset bounds to the download URL, and the xarray.open_dataset method doesn't handle that well. I will look into that soon.

Summary of changes

  • Refactor test to use NERACOOS WW3_EastCoast_latest dataset
  • Modify fixture 'dataset_opendap'

Please feel free to request any changes.

Thank you,
Vini

  - Refactor test to use NERACOOS WW3_EastCoast_latest dataset
  - Modify fixture 'dataset_opendap'
@abkfenris
Copy link
Contributor

Hmm, it might be easier to figure out what is going on with a failing test.

@vinisalazar
Copy link
Contributor Author

I think the test was working in the end (checks passed for #252), but I thought it would be nice to follow up on the suggestion of using the NERACOOS server instead of the CSWC one? Apologies if I misunderstood that.

@abkfenris
Copy link
Contributor

I mean if a change to using NERACOOS is causing things to fail, then adding failing test.

Then we can debug the test to figure out if the fix is needed in erddapy or with our ERDDAP.

Staying with Coastwatch is likely to get us to have more random failures as Bob is much more aggressive with his blocklist than we are (though sometimes Cloudflare in front of our ERDDAP ends up problematic too).

@vinisalazar
Copy link
Contributor Author

vinisalazar commented Jun 16, 2022

Thanks @abkfenris, that makes sense.

I don't think it was the server change which was causing things to fail, but rather the interaction between the griddap_initalize and to_xarray methods. So I believe the problem is with erddapy, not the ERDDAP server.

In #252, I edited the to_xarray so when the response is "opendap", xarray loads the data lazily from the download URL (instead of loading the response into memory and creating an nc dataset like in the other response types). However, if griddap_initalize is called, the dataset constraints are added to the download URL, and xarray can no longer generate the dataset from it (it requires the download URL without constraints). For example, having this:

def dataset_opendap():
    """Load griddap data with OPeNDAP response for testing."""
    cswc = ERDDAP(server="CSWC", protocol="griddap", response="opendap")
    cswc.dataset_id = "jplAquariusSSS3MonthV5"
    cswc.griddap_initialize()
    yield cswc

Returns this:

E   OSError: [Errno -75] NetCDF: Malformed or unexpected Constraint: b'https://coastwatch.pfeg.noaa.gov/erddap/griddap/jplAquariusSSS3MonthV5.opendap?sss[(2015-03-21T00:00:00Z):1:(2015-03-21T00:00:00Z)][(89.5):1:(-89.5)][(-179.5):1:(179.5)]'

To work around this, when I submitted #252, I created a new test fixture using the server shown in 01a-griddap.ipynb as an example of an opendap response. I initially tried using the dataset_griddap fixture, but ran into the problem I described above, so I created the new fixture that didn't run the griddap_initialize method.

Since the Coastwatch server may present random failures, I thought it would be better to stick with the NERACOOS server, so I refactored the dataset_opendap fixture to use that one instead (the only difference to dataset_griddap is that it doesn't run griddap_initialize).

I hope this makes it clearer! Apologies for the lengthy description.

Thinking about this again, I'm considering removing the dataset_opendap fixture entirely, to modify either the griddap_initialize or the to_xarray methods to avoid this problem, and to add some exception handling to to_xarray to make error messages clearer.

Please let me know what you think is the best course of action to take here!

Best,
Vini

@abkfenris
Copy link
Contributor

Hmm, maybe this is something that it's better to leave a bit awkward for now, and address in the larger refactoring, even though it might be possible to tweak or add empty constraints/variables to .get_download_url() when to_xarray() calls ir right now.

I think when we refactor we can be more opinionated in at the GridDataset level, and maybe one of those things is that we don't keep the protocol at the class level, and only expose it in methods arguments as needed.

So, GridDataset.to_xarray() would always be an opendap query, then we could return an xarray Dataset that is already filtered down by the constraints using .sel(). However a GridDataset.url() would take the protocol to build a response with.

@abkfenris abkfenris merged commit 409562d into ioos:main Jun 22, 2022
@vinisalazar vinisalazar deleted the issue-250 branch July 28, 2022 13:14
@ocefpaf ocefpaf added the GSoC22 label Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants