Skip to content

Conversation

@ChrisFairless
Copy link
Collaborator

@ChrisFairless ChrisFairless commented Feb 7, 2023

Changes proposed in this PR:

  • When creating a trackset from IBTrACS also read the 'Nature' of a storm. The nature gives information about whether a system is a disturbance, tropical storm, post-transition extratropical storm, or subtropical storm (or whether agencies disagree or don't report). This is potentially useful in visualisations and for generating synthetic tracks.

Notes:

  • Not urgent! We don't need this in time for the next release!
  • Do we need a discussion? There are lots of variables that we exclude when reading from IBTrACS and we can't include them all. A more flexible alternative would be to add a keyword for 'other columns' you want from the file.
  • The 'nature' variable is only explicitly created for tracksets from current IBTrACS (not olders csvs), and not for CHAZ, STORM, etc. It looks like there are already other properties that are IBTrACS-only, e.g radius of outermost closed isobar, so hopefully this is ok. I added a check for existence of the variable where relevant.
  • When interpolating with equal_timestep I chose a forward fill method, rather than the default nearest neighbour.

PR Author Checklist

PR Reviewer Checklist

@tovogt
Copy link
Collaborator

tovogt commented Feb 7, 2023

Thanks for the PR! I can imagine that this might be useful information for some people.

  • Do we need a discussion? There are lots of variables that we exclude when reading from IBTrACS and we can't include them all. A more flexible alternative would be to add a keyword for 'other columns' you want from the file.

Here is a list of all non-agency-specific variables that we currently ignore:

[
    'numobs', 'season', 'number', 'subbasin', 'name', 'source_usa', 'source_jma',
    'source_cma', 'source_hko', 'source_new', 'source_reu', 'source_bom', 'source_nad',
    'source_wel', 'source_td5', 'source_td6', 'source_ds8', 'source_neu', 'source_mlc',
    'iso_time', 'nature', 'lat', 'lon', 'wmo_wind', 'wmo_pres', 'wmo_agency', 'track_type',
    'main_track_sid', 'dist2land', 'landfall', 'iflag', 'storm_speed', 'storm_dir'
]

When including agency-specific variables, the list is even longer. I tend to prefer supporting a keyword argument for "other variables" to include instead of adding explicit support for the "nature" variable. Maybe we can start by supporting non-agency-specific variables, because the agency-specific need to be treated a bit differently?

  • The 'nature' variable is only explicitly created for tracksets from current IBTrACS (not olders csvs), and not for CHAZ, STORM, etc. It looks like there are already other properties that are IBTrACS-only, e.g radius of outermost closed isobar, so hopefully this is ok. I added a check for existence of the variable where relevant.

In general, I think adding additional variables should be fine. We have to avoid breaking file read/write operations though.

  • When interpolating with equal_timestep I chose a forward fill method, rather than the default nearest neighbour.

Why would you do that?

Comment on lines +1893 to +1894
if hasattr(dfr, 'nature'):
tr_ds['nature'] = ('time', dfr['nature'].values.astype('<U2'))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary to support this. The legacy CSV format is a completely undocumented data type. I have never seen such a file apart from our test file. I don't know anyone who is using this functionality.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! This was just to stop the tests failing :p

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the tests would fail without this. Can you elaborate, please?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh lol the tests were failing due to a line I added and later removed. We can definitely drop this.
I'll put together a new pull request shortly.

@ChrisFairless
Copy link
Collaborator Author

Sorry, I missed your main response last week!

When including agency-specific variables, the list is even longer. I tend to prefer supporting a keyword argument for "other variables" to include instead of adding explicit support for the "nature" variable. Maybe we can start by supporting non-agency-specific variables, because the agency-specific need to be treated a bit differently?

I think this makes sense. I'll update the pull request, but for non-agency-specific variables. A few of the variables you list will be useful in different situations.

In general, I think adding additional variables should be fine. We have to avoid breaking file read/write operations though.

To be clear: are you suggesting always reading 'nature' and allowing additional variables, or just allowing additional variables here?

  • When interpolating with equal_timestep I chose a forward fill method, rather than the default nearest neighbour.

Not sure why I chose this now ... I think it was from an overabundance of caution around landfalls. Let's go for nearest neighbour then.

@tovogt
Copy link
Collaborator

tovogt commented Feb 13, 2023

In general, I think adding additional variables should be fine. We have to avoid breaking file read/write operations though.

To be clear: are you suggesting always reading 'nature' and allowing additional variables, or just allowing additional variables here?

I think this remark is mostly about this section of code:

# when writing '<U2' and reading in again, xarray reads as dtype 'object'. undo this:
for varname in ['basin', 'nature']:
track[varname] = track[varname].astype('<U2')

It's a bit unclear what to do when we support additional variables of varying dtype. We can't just assume that all "object" variables are actually supposed to be "<U2" like here:

# when writing '<U2' and reading in again, xarray reads as dtype 'object'. undo this:
for varname in track.variables:
    if track[varname].dtype == "object":
        track[varname] = track[varname].astype('<U2')

Maybe we can at least assume that all "object" dtypes are actually strings. Then we could do something like this:

# when writing '<U2' and reading in again, xarray reads as dtype 'object'. undo this:
for varname in track.variables:
    if track[varname].dtype == "object":
        strlen = track[varname].str.len().max().item()
        track[varname] = track[varname].astype(f"<U{strlen}")

@ChrisFairless
Copy link
Collaborator Author

Assuming a string seems reasonable. Is there a reason we always want to limit its length? What are the memory/efficiency gains there?

@tovogt
Copy link
Collaborator

tovogt commented Feb 13, 2023

The most important aspect for me was that you should be able to cycle, i.e., storing data and reading it again should yield the identical object. I don't know about the memory requirements.

@peanutfun
Copy link
Member

I'll put together a new pull request shortly.

@ChrisFairless I don't think that's necessary, you can simply rewrite your changes here. But if you still want to, I would wait with my review until this or the new PR are ready. In general, I second @tovogt's suggesion of reading any additional variable through keyword arguments.

As a side note, I guess this discussion goes to show that it's helpful to discuss any additions before implementing them, although I very much appreciate the initiative of coming up with suggestions through a PR in the first place 😅

@tovogt
Copy link
Collaborator

tovogt commented Feb 14, 2023

As a side note, I guess this discussion goes to show that it's helpful to discuss any additions before implementing them, although I very much appreciate the initiative of coming up with suggestions through a PR in the first place sweat_smile

In my opinion, it's good to create a draft PR right from the start when proposing a new feature, and have a discussion in the PR. I know that it somehow makes the PR hard to follow because it has so many posts. Because of that, it might make sense to create a new PR at some point. But the big advantage is that you avoid a lot of misunderstandings because you talk directly about concrete code changes. When talking about features at a very high level, I often discuss for a while before noticing that there was actually a misunderstanding about the kind of change we are actually discussing.

@ChrisFairless
Copy link
Collaborator Author

Definitely definitely agree here!!

But in this case I needed the feature pretty urgently so I coded first, PR'd later 😅

@peanutfun
Copy link
Member

How to continue here? There seems to be at least one unresolved discussion. @ChrisFairless will you update this PR or should we close it in favor of a new one?

@ChrisFairless
Copy link
Collaborator Author

Fine to close this.

I want to update things as discussed above, but it'll take some time I don't have right now.

@peanutfun
Copy link
Member

@ChrisFairless That's alright! I'll close this PR and create an issue on the topic, so we don't lose track. Thanks for the initial contribution!

@bguillod
Copy link
Collaborator

Since nobody seems to have capacity to implement the flexible reading of any variable in IBTrACS, it looks to me like the requested addition to this PR is not very relevant or does not have any specific use case for the moment.
On the other hand, we urgently need the storm type information (and only that additional variable) for the new TC synthetic tracks methodology. Therefore I'd like this PR to be reconsidered for merging into the develop branch, to allow us to move ahead and finalize the new synthetic tracks model. Does this make sense?

@bguillod bguillod reopened this May 23, 2023
@chahank
Copy link
Member

chahank commented May 23, 2023

Please address the review comments in the code and then we can proceed. Thanks.

@tovogt
Copy link
Collaborator

tovogt commented May 23, 2023

If nobody has time to do this "right", I will take this on immediately. We have discussed everything, the coding should be really easy. I don't see why we would go with the "nature" variable only. I will report back later today.

@bguillod
Copy link
Collaborator

If nobody has time to do this "right", I will take this on immediately. We have discussed everything, the coding should be really easy. I don't see why we would go with the "nature" variable only. I will report back later today.

Sure, you're most welcome to implement it if you think that's a worthwhile addition. Thanks a lot, very much appreciated!

@emanuel-schmid
Copy link
Collaborator

🙌

@tovogt
Copy link
Collaborator

tovogt commented May 23, 2023

Let's continue here: #728

@tovogt tovogt closed this May 23, 2023
@emanuel-schmid emanuel-schmid deleted the feature/get-stormtype-from-ibtracs branch May 25, 2023 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants