Feature/get stormtype from ibtracs #646

ChrisFairless · 2023-02-07T14:24:22Z

Changes proposed in this PR:

When creating a trackset from IBTrACS also read the 'Nature' of a storm. The nature gives information about whether a system is a disturbance, tropical storm, post-transition extratropical storm, or subtropical storm (or whether agencies disagree or don't report). This is potentially useful in visualisations and for generating synthetic tracks.

Notes:

Not urgent! We don't need this in time for the next release!
Do we need a discussion? There are lots of variables that we exclude when reading from IBTrACS and we can't include them all. A more flexible alternative would be to add a keyword for 'other columns' you want from the file.
The 'nature' variable is only explicitly created for tracksets from current IBTrACS (not olders csvs), and not for CHAZ, STORM, etc. It looks like there are already other properties that are IBTrACS-only, e.g radius of outermost closed isobar, so hopefully this is ok. I added a check for existence of the variable where relevant.
When interpolating with equal_timestep I chose a forward fill method, rather than the default nearest neighbour.

PR Author Checklist

PR Reviewer Checklist

tovogt · 2023-02-07T15:34:02Z

Thanks for the PR! I can imagine that this might be useful information for some people.

Do we need a discussion? There are lots of variables that we exclude when reading from IBTrACS and we can't include them all. A more flexible alternative would be to add a keyword for 'other columns' you want from the file.

Here is a list of all non-agency-specific variables that we currently ignore:

[
    'numobs', 'season', 'number', 'subbasin', 'name', 'source_usa', 'source_jma',
    'source_cma', 'source_hko', 'source_new', 'source_reu', 'source_bom', 'source_nad',
    'source_wel', 'source_td5', 'source_td6', 'source_ds8', 'source_neu', 'source_mlc',
    'iso_time', 'nature', 'lat', 'lon', 'wmo_wind', 'wmo_pres', 'wmo_agency', 'track_type',
    'main_track_sid', 'dist2land', 'landfall', 'iflag', 'storm_speed', 'storm_dir'
]

When including agency-specific variables, the list is even longer. I tend to prefer supporting a keyword argument for "other variables" to include instead of adding explicit support for the "nature" variable. Maybe we can start by supporting non-agency-specific variables, because the agency-specific need to be treated a bit differently?

The 'nature' variable is only explicitly created for tracksets from current IBTrACS (not olders csvs), and not for CHAZ, STORM, etc. It looks like there are already other properties that are IBTrACS-only, e.g radius of outermost closed isobar, so hopefully this is ok. I added a check for existence of the variable where relevant.

In general, I think adding additional variables should be fine. We have to avoid breaking file read/write operations though.

When interpolating with equal_timestep I chose a forward fill method, rather than the default nearest neighbour.

Why would you do that?

tovogt · 2023-02-07T15:35:40Z

climada/hazard/tc_tracks.py

+    if hasattr(dfr, 'nature'):
+        tr_ds['nature'] = ('time', dfr['nature'].values.astype('<U2'))


I don't think it's necessary to support this. The legacy CSV format is a completely undocumented data type. I have never seen such a file apart from our test file. I don't know anyone who is using this functionality.

Good point! This was just to stop the tests failing :p

I don't understand why the tests would fail without this. Can you elaborate, please?

Oh lol the tests were failing due to a line I added and later removed. We can definitely drop this.
I'll put together a new pull request shortly.

ChrisFairless · 2023-02-13T14:30:23Z

Sorry, I missed your main response last week!

When including agency-specific variables, the list is even longer. I tend to prefer supporting a keyword argument for "other variables" to include instead of adding explicit support for the "nature" variable. Maybe we can start by supporting non-agency-specific variables, because the agency-specific need to be treated a bit differently?

I think this makes sense. I'll update the pull request, but for non-agency-specific variables. A few of the variables you list will be useful in different situations.

In general, I think adding additional variables should be fine. We have to avoid breaking file read/write operations though.

To be clear: are you suggesting always reading 'nature' and allowing additional variables, or just allowing additional variables here?

When interpolating with equal_timestep I chose a forward fill method, rather than the default nearest neighbour.

Not sure why I chose this now ... I think it was from an overabundance of caution around landfalls. Let's go for nearest neighbour then.

tovogt · 2023-02-13T15:17:03Z

In general, I think adding additional variables should be fine. We have to avoid breaking file read/write operations though.

To be clear: are you suggesting always reading 'nature' and allowing additional variables, or just allowing additional variables here?

I think this remark is mostly about this section of code:

climada_python/climada/hazard/tc_tracks.py

Lines 1387 to 1389 in cfebf28

    
           # when writing '<U2' and reading in again, xarray reads as dtype 'object'. undo this: 
        
           for varname in ['basin', 'nature']: 
        
               track[varname] = track[varname].astype('<U2')

It's a bit unclear what to do when we support additional variables of varying dtype. We can't just assume that all "object" variables are actually supposed to be "<U2" like here:

# when writing '<U2' and reading in again, xarray reads as dtype 'object'. undo this:
for varname in track.variables:
    if track[varname].dtype == "object":
        track[varname] = track[varname].astype('<U2')

Maybe we can at least assume that all "object" dtypes are actually strings. Then we could do something like this:

# when writing '<U2' and reading in again, xarray reads as dtype 'object'. undo this:
for varname in track.variables:
    if track[varname].dtype == "object":
        strlen = track[varname].str.len().max().item()
        track[varname] = track[varname].astype(f"<U{strlen}")

ChrisFairless · 2023-02-13T15:56:47Z

Assuming a string seems reasonable. Is there a reason we always want to limit its length? What are the memory/efficiency gains there?

tovogt · 2023-02-13T18:07:02Z

The most important aspect for me was that you should be able to cycle, i.e., storing data and reading it again should yield the identical object. I don't know about the memory requirements.

peanutfun · 2023-02-14T15:43:11Z

I'll put together a new pull request shortly.

@ChrisFairless I don't think that's necessary, you can simply rewrite your changes here. But if you still want to, I would wait with my review until this or the new PR are ready. In general, I second @tovogt's suggesion of reading any additional variable through keyword arguments.

As a side note, I guess this discussion goes to show that it's helpful to discuss any additions before implementing them, although I very much appreciate the initiative of coming up with suggestions through a PR in the first place 😅

tovogt · 2023-02-14T16:01:38Z

As a side note, I guess this discussion goes to show that it's helpful to discuss any additions before implementing them, although I very much appreciate the initiative of coming up with suggestions through a PR in the first place sweat_smile

In my opinion, it's good to create a draft PR right from the start when proposing a new feature, and have a discussion in the PR. I know that it somehow makes the PR hard to follow because it has so many posts. Because of that, it might make sense to create a new PR at some point. But the big advantage is that you avoid a lot of misunderstandings because you talk directly about concrete code changes. When talking about features at a very high level, I often discuss for a while before noticing that there was actually a misunderstanding about the kind of change we are actually discussing.

ChrisFairless · 2023-02-14T16:07:35Z

Definitely definitely agree here!!

But in this case I needed the feature pretty urgently so I coded first, PR'd later 😅

peanutfun · 2023-02-28T09:46:33Z

How to continue here? There seems to be at least one unresolved discussion. @ChrisFairless will you update this PR or should we close it in favor of a new one?

ChrisFairless · 2023-03-13T19:12:51Z

Fine to close this.

I want to update things as discussed above, but it'll take some time I don't have right now.

peanutfun · 2023-03-20T09:59:39Z

@ChrisFairless That's alright! I'll close this PR and create an issue on the topic, so we don't lose track. Thanks for the initial contribution!

bguillod · 2023-05-23T08:39:31Z

Since nobody seems to have capacity to implement the flexible reading of any variable in IBTrACS, it looks to me like the requested addition to this PR is not very relevant or does not have any specific use case for the moment.
On the other hand, we urgently need the storm type information (and only that additional variable) for the new TC synthetic tracks methodology. Therefore I'd like this PR to be reconsidered for merging into the develop branch, to allow us to move ahead and finalize the new synthetic tracks model. Does this make sense?

chahank · 2023-05-23T08:48:08Z

Please address the review comments in the code and then we can proceed. Thanks.

tovogt · 2023-05-23T09:03:15Z

If nobody has time to do this "right", I will take this on immediately. We have discussed everything, the coding should be really easy. I don't see why we would go with the "nature" variable only. I will report back later today.

bguillod · 2023-05-23T09:26:07Z

If nobody has time to do this "right", I will take this on immediately. We have discussed everything, the coding should be really easy. I don't see why we would go with the "nature" variable only. I will report back later today.

Sure, you're most welcome to implement it if you think that's a worthwhile addition. Thanks a lot, very much appreciated!

emanuel-schmid · 2023-05-23T10:22:28Z

🙌

tovogt · 2023-05-23T11:26:24Z

Let's continue here: #728

ChrisFairless added 3 commits February 7, 2023 12:06

Read nature from IBTrACS too

e30d1f5

Fixup

c4ddf94

Update tests for reading TCTrack nature

cfebf28

ChrisFairless requested review from peanutfun and tovogt February 7, 2023 14:29

tovogt reviewed Feb 7, 2023

View reviewed changes

peanutfun mentioned this pull request Mar 20, 2023

Read storm type ("nature") from IBTrACS data #674

Closed

peanutfun closed this Mar 20, 2023

ChrisFairless mentioned this pull request May 2, 2023

Improve random walk synthetic track generation #688

Closed

13 tasks

bguillod reopened this May 23, 2023

tovogt mentioned this pull request May 23, 2023

TCTracks: Option to read additional variables from IBTrACS (e.g. "nature"/stormtype) #728

Merged

13 tasks

tovogt closed this May 23, 2023

emanuel-schmid deleted the feature/get-stormtype-from-ibtracs branch May 25, 2023 08:44

		if hasattr(dfr, 'nature'):
		tr_ds['nature'] = ('time', dfr['nature'].values.astype('<U2'))

Feature/get stormtype from ibtracs #646

Feature/get stormtype from ibtracs #646

Uh oh!

Conversation

ChrisFairless commented Feb 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Author Checklist

PR Reviewer Checklist

Uh oh!

tovogt commented Feb 7, 2023

Uh oh!

tovogt Feb 7, 2023

Choose a reason for hiding this comment

Uh oh!

ChrisFairless Feb 8, 2023

Choose a reason for hiding this comment

Uh oh!

tovogt Feb 13, 2023

Choose a reason for hiding this comment

Uh oh!

ChrisFairless Feb 14, 2023

Choose a reason for hiding this comment

Uh oh!

ChrisFairless commented Feb 13, 2023

Uh oh!

tovogt commented Feb 13, 2023

Uh oh!

ChrisFairless commented Feb 13, 2023

Uh oh!

tovogt commented Feb 13, 2023

Uh oh!

peanutfun commented Feb 14, 2023

Uh oh!

tovogt commented Feb 14, 2023

Uh oh!

ChrisFairless commented Feb 14, 2023

Uh oh!

peanutfun commented Feb 28, 2023

Uh oh!

ChrisFairless commented Mar 13, 2023

Uh oh!

peanutfun commented Mar 20, 2023

Uh oh!

bguillod commented May 23, 2023

Uh oh!

chahank commented May 23, 2023

Uh oh!

tovogt commented May 23, 2023

Uh oh!

bguillod commented May 23, 2023

Uh oh!

emanuel-schmid commented May 23, 2023

Uh oh!

tovogt commented May 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ChrisFairless commented Feb 7, 2023 •

edited

Loading