-
Notifications
You must be signed in to change notification settings - Fork 149
Feature/get stormtype from ibtracs #646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for the PR! I can imagine that this might be useful information for some people.
Here is a list of all non-agency-specific variables that we currently ignore: [
'numobs', 'season', 'number', 'subbasin', 'name', 'source_usa', 'source_jma',
'source_cma', 'source_hko', 'source_new', 'source_reu', 'source_bom', 'source_nad',
'source_wel', 'source_td5', 'source_td6', 'source_ds8', 'source_neu', 'source_mlc',
'iso_time', 'nature', 'lat', 'lon', 'wmo_wind', 'wmo_pres', 'wmo_agency', 'track_type',
'main_track_sid', 'dist2land', 'landfall', 'iflag', 'storm_speed', 'storm_dir'
]When including agency-specific variables, the list is even longer. I tend to prefer supporting a keyword argument for "other variables" to include instead of adding explicit support for the "nature" variable. Maybe we can start by supporting non-agency-specific variables, because the agency-specific need to be treated a bit differently?
In general, I think adding additional variables should be fine. We have to avoid breaking file read/write operations though.
Why would you do that? |
| if hasattr(dfr, 'nature'): | ||
| tr_ds['nature'] = ('time', dfr['nature'].values.astype('<U2')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's necessary to support this. The legacy CSV format is a completely undocumented data type. I have never seen such a file apart from our test file. I don't know anyone who is using this functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! This was just to stop the tests failing :p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why the tests would fail without this. Can you elaborate, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh lol the tests were failing due to a line I added and later removed. We can definitely drop this.
I'll put together a new pull request shortly.
|
Sorry, I missed your main response last week!
I think this makes sense. I'll update the pull request, but for non-agency-specific variables. A few of the variables you list will be useful in different situations.
To be clear: are you suggesting always reading 'nature' and allowing additional variables, or just allowing additional variables here?
Not sure why I chose this now ... I think it was from an overabundance of caution around landfalls. Let's go for nearest neighbour then. |
I think this remark is mostly about this section of code: climada_python/climada/hazard/tc_tracks.py Lines 1387 to 1389 in cfebf28
It's a bit unclear what to do when we support additional variables of varying dtype. We can't just assume that all "object" variables are actually supposed to be "<U2" like here:
# when writing '<U2' and reading in again, xarray reads as dtype 'object'. undo this:
for varname in track.variables:
if track[varname].dtype == "object":
track[varname] = track[varname].astype('<U2')Maybe we can at least assume that all "object" dtypes are actually strings. Then we could do something like this: # when writing '<U2' and reading in again, xarray reads as dtype 'object'. undo this:
for varname in track.variables:
if track[varname].dtype == "object":
strlen = track[varname].str.len().max().item()
track[varname] = track[varname].astype(f"<U{strlen}") |
|
Assuming a string seems reasonable. Is there a reason we always want to limit its length? What are the memory/efficiency gains there? |
|
The most important aspect for me was that you should be able to cycle, i.e., storing data and reading it again should yield the identical object. I don't know about the memory requirements. |
@ChrisFairless I don't think that's necessary, you can simply rewrite your changes here. But if you still want to, I would wait with my review until this or the new PR are ready. In general, I second @tovogt's suggesion of reading any additional variable through keyword arguments. As a side note, I guess this discussion goes to show that it's helpful to discuss any additions before implementing them, although I very much appreciate the initiative of coming up with suggestions through a PR in the first place 😅 |
In my opinion, it's good to create a draft PR right from the start when proposing a new feature, and have a discussion in the PR. I know that it somehow makes the PR hard to follow because it has so many posts. Because of that, it might make sense to create a new PR at some point. But the big advantage is that you avoid a lot of misunderstandings because you talk directly about concrete code changes. When talking about features at a very high level, I often discuss for a while before noticing that there was actually a misunderstanding about the kind of change we are actually discussing. |
|
Definitely definitely agree here!! But in this case I needed the feature pretty urgently so I coded first, PR'd later 😅 |
|
How to continue here? There seems to be at least one unresolved discussion. @ChrisFairless will you update this PR or should we close it in favor of a new one? |
|
Fine to close this. I want to update things as discussed above, but it'll take some time I don't have right now. |
|
@ChrisFairless That's alright! I'll close this PR and create an issue on the topic, so we don't lose track. Thanks for the initial contribution! |
|
Since nobody seems to have capacity to implement the flexible reading of any variable in IBTrACS, it looks to me like the requested addition to this PR is not very relevant or does not have any specific use case for the moment. |
|
Please address the review comments in the code and then we can proceed. Thanks. |
|
If nobody has time to do this "right", I will take this on immediately. We have discussed everything, the coding should be really easy. I don't see why we would go with the "nature" variable only. I will report back later today. |
Sure, you're most welcome to implement it if you think that's a worthwhile addition. Thanks a lot, very much appreciated! |
|
🙌 |
|
Let's continue here: #728 |
Changes proposed in this PR:
Notes:
equal_timestepI chose a forward fill method, rather than the default nearest neighbour.PR Author Checklist
develop)PR Reviewer Checklist