-
Notifications
You must be signed in to change notification settings - Fork 44
Add esmvalcore.dataset module
#1877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1877 +/- ##
==========================================
+ Coverage 91.67% 92.02% +0.35%
==========================================
Files 232 234 +2
Lines 11561 12030 +469
==========================================
+ Hits 10598 11071 +473
+ Misses 963 959 -4
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
|
The remaining Codacy issues fall into three categories:
|
schlunma
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this addition Bouwe!! 🚀 I had a look at the code that changes the current behavior of the tool. Looks really good, I only have very small comments.
I will now play around with the notebooks.
Note: I did NOT look at the code of the new features.
Co-authored-by: Manuel Schlund <32543114+schlunma@users.noreply.github.com>
|
I just tested the notebooks. This is really amazing stuff, Bouwe! Especially the usage of wildcards in combination with the A couple of comments:
files = list(tas.files)
for anc in tas.ancillaries:
files.extend(anc.files)
files = [file for file in files if isinstance(file, ESGFFile)]
download(files, CFG['download_dir'])
tas.find_files()
print(tas.files)
for anc in tas.ancillaries:
print(anc.files)
tas = Dataset(
short_name='tas',
mip='Amon',
project='CMIP5',
dataset='GFDL-ESM2G',
ensemble='r1i1p1',
exp='historical',
timerange='1990/2100',
)
tas.augment_facets()
tas.add_ancillary(short_name='areacella', mip='fx', ensemble='r0i0p0')
print(tas.ancillaries)gives [Dataset:
{'dataset': 'GFDL-ESM2G',
'project': 'CMIP5',
'mip': 'fx',
'short_name': 'areacella',
'ensemble': 'r0i0p0',
'exp': 'historical',
'frequency': 'mon',
'institute': ['NOAA-GFDL'],
'long_name': 'Near-Surface Air Temperature',
'modeling_realm': ['atmos'],
'original_short_name': 'tas',
'product': ['output1', 'output2'],
'standard_name': 'air_temperature',
'timerange': '1990/2100',
'units': 'K'}
session: 'session-d9f21999-2ea5-43b7-932d-29a9045fde5a_20230110_130835']Notice the wrong
tas = Dataset(
short_name='tas',
mip='Amon',
project='CMIP5',
dataset='CanESM2',
ensemble='r1i1p1',
exp=['historical', 'rcp85'],
timerange='1850/2100',
)the tool found fx files for the historical and rcp85 simulation. This leads to a concatenation error when calling ERROR:esmvalcore.preprocessor:Failed to run preprocessor function 'concatenate' on the data
|
|
Thanks for reviewing @schlunma!
I would prefer to put that in a separate notebook in another pull request. The content is already in the |
It should be fixed now |
Thanks, I changed it
Good find! I changed the experiment to just |
Indeed, it does not work if you do it in that order. I don't think it can be made to work reliably, so I added some documentation to the docstring and notebook to explain more clearly how the ancillaries are added. Do you think it is better now? |
I added a check if there are any input files so at least there is no weird |
|
Thanks Bouwe for the changes, all the notebooks run fine now!
I am running some recipes at the moment, all looks fine so far 👍 |
I think this should be left to the user of the API. For recipes, there isn't much flexibility for programming things, but for the API there is. You could manually choose which dataset you want to use as an ancillary dataset or if you have many you could compare the facets and choose the best match automatically. E.g. for use from the recipe in #1609 I implemented ESMValCore/esmvalcore/_recipe/datasets.py Lines 255 to 289 in 8cdd9cc
|
Would it be OK with you to postpone implementing such a feature to another pull request? I can see that it would be useful, but it's not trivial to implement. I could compare the bounds of the time coordinate (if there are any and fall back to points otherwise and do something clever to compensate for the fact that time points can be anywhere in the interval) with the requested period in the |
Not too happy about this to be honest, but since this is such an edge case and it will be handled when used within ESMValTool I guess that's fine.
Couldn't we use the same code we use in the tool for this? If the time range of the files doesn't match the requested time range an error like the following is raised: 2023-01-13 08:59:17,614 UTC [3178263] ERROR Missing data for preprocessor test/tas:
- No input data available for years 2101-2150 in files:
/work/bd0854/DATA/ESMValTool2/CMIP5_DKRZ/NOAA-GFDL/GFDL-ESM2G/rcp85/mon/atmos/Amon/r1i1p1/v20120412/tas/tas_Amon_GFDL-ESM2G_rcp85_r1i1p1_208601-209012.nc
/work/bd0854/DATA/ESMValTool2/CMIP5_DKRZ/NOAA-GFDL/GFDL-ESM2G/rcp85/mon/atmos/Amon/r1i1p1/v20120412/tas/tas_Amon_GFDL-ESM2G_rcp85_r1i1p1_209101-209512.nc
/work/bd0854/DATA/ESMValTool2/CMIP5_DKRZ/NOAA-GFDL/GFDL-ESM2G/rcp85/mon/atmos/Amon/r1i1p1/v20120412/tas/tas_Amon_GFDL-ESM2G_rcp85_r1i1p1_209601-210012.ncAgain, not too happy about this, but if it's too much work and you make sure that this is handled when this is used within the tool I am also fine with it. All the recipes I tested ran fine with this change, so I think this is good to go 👍 Thanks again for this great feature, @bouweandela! 🎉 |
|
Awesome, thanks a lot for reviewing @schlunma!
There are two main issues with the current code used for checking when running the recipe:
so I feel that adding this will give users a false sense of security because it only works sometimes.
Thanks, I will make sure the behavior of the tool when running a recipe does not change in #1609 and I created an issue to describe the feature here #1892. |
But since we use the |
valeriupredoi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through the code too, and I think this is super solid! But I have a few minor suggestions; am approving this since it's up to @bouweandela to implement my suggestions, nothing make or break. Note that I've not looked at the iPython notebooks 🍺
|
OK I reckon this can be merged now, cheers muchly for the great work @bouweandela and @schlunma - this is awesome! |
New features
Add the
esmvalcore.dataset.Datasetclass, which can be used to define datasets from Python. It can be used to find and load the data needed for using theesmvalcore.preprocessorfunctions.Add the function
esmvalcore.dataset.datasets_to_recipe, which can be used to create or update recipes withDatasets. This can be a convenient alternative to the recipe filler utility.Adds a new option to config-user.yml called
always_search_esgf. This will cause the tool to always search ESGF for files, even if some files are available locally. This feature can be used when you want to 1) create a recipe with all data available on ESGF or 2) download the latest versions of files from ESGF when there is already an older version available.Move the files that do things related to loading a recipe (
esmvalcore/_recipe.py,esmvalcore/_recipe_checks.py, andesmvalcore/recipe_schema.yml) into a newesmvalcore/_recipedirectory. This makes it possible to add functionality without growing the fileesmvalcore/_recipe.pyeven further. -> Should this be done in a separate pull request?Example Jupyter notebooks that demonstrate how to use the new features are available in the
notebooksdirectory and also shown on readthedocs in a new section Example notebooks.This is part of #1609. The next (and hopefully final) step towards getting #1609 merged will be to use the new
Datasetinesmvalcore/_recipe.pyto read the datasets in the recipe. This pull request introduces some duplicate code because it already transfers some of the functionality fromesmvalcore/_recipe.pyto theesmvalcore.dataset.Dataset, but it cannot be removed yet because it is still used by the old dataset reading code.Link to documentation: https://esmvaltool--1877.org.readthedocs.build/projects/ESMValCore/en/1877/api/esmvalcore.dataset.html
Before you get started
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.
To help with the number pull requests: