Skip to content

Conversation

@NicolasColombi
Copy link
Collaborator

@NicolasColombi NicolasColombi commented Feb 6, 2025

Changes proposed in this PR:

This PR adds a function to compute track density. Tracks are considered lines but are in reality a succession of points. The function ensure equal time step for all tracks by calling equal_timestep(), then proceed to compute the number of points per grid cell with the desired resolution. compute_track_density() computes the data for plot_track_density().

Functions added:

  • compute_track_density()
  • test_compute_track_density()
  • compute_grid_area()
  • plot_track_density()

tropical_cyclone_track_density

This PR fixes #

PR Author Checklist

PR Reviewer Checklist

@NicolasColombi NicolasColombi changed the base branch from main to develop February 6, 2025 10:29
Comment on lines 2875 to 2876
"""Compute absolute and normalized tropical cyclone track density as the number of points per
grid cell.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting addition. I think though that even with the explanation in the pull request, I do not understand what this method does (and what is the use of it). Can you expand on the docstring to make it clearer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was exactly doing it in this moment ;)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the docstring which should be clearer now. The role of this function is to create the data for an other plotting function (which will follow soon) and should give something this (first approximation):

image

A plot like this allows you to compare TC activities between different climate scenarios, or different time periods, or models. It's a visualization for the hazard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the exact goal? Because the method now counts the number of track points per pixel. Thus, if a track is just staying a long time in the same place, it counts as high density. But, it is still just a single track in this pixel. Conversely, if a track is moving fast, it gets low density. I am not sure what this would indicate to be honest.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, but, the goal is to display TC activity, and if a track stays longer in a grid cell, this rightfully represent higher TC activity in my opinion. The question then becomes, how do you define TC activity? Is it how many different tracks cross a grid cell or how many tracks cross a grid, and how long they stay in it (that is the current version)?

If you consider two different cyclones with two different translational speeds but with the same angular wind speed, the slower one will have more impact because it stays longer and has the same intensity as the other. So in an impact context, track density (a measure of TC activity) should reflect this.

But I see your point, and it's worth discussing it with Kerry and Simona next week, frankly both methods would be valid, they just represent something slightly different.

For clarification, the image I sent represent the track density (defined as in this message) for 300 tracks in 2025 with CMIP5 data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the analysis. But then I would ask: why not look at wind intensity density? Because if the argument is that a track staying longer in a position creates more impact, then you would also have to consider that higher winds create more impact. Hence, I would say:
1- if you are interested in impacts, you should consider the windfield distributions
2- if you are interested in track density, then you should consider different track densities.
But maybe I am missing the point and the approach for the track point density is the way to go?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, I agree. The point approach was just the practical way I thought to program it. I need to rethink it a bit then 🤓. I'll change it so that a maximum of one point per track per grid cell will be allowed to be counted in that cell.
And at a later stage, we can think of how to compute wind field densities.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it can be an argument in the method (if it is easy to implement)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it's easy

Comment on lines 2904 to 2905
lat_bins = np.arange(-90, 91, res) # 91 and not 90 for the bins (90 included)
lon_bins = np.arange(-180, 181, res)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would lead to values larger than 90 if say res=0.3. I think this is not what you want. Also, note that it probably is better to use np.linspace than np.arange for non-integer values (see https://numpy.org/doc/stable/reference/generated/numpy.arange.html)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, thanks for catching that!

Comment on lines 2912 to 2914
hist = hist / hist.sum() if mode == "normalized" else hist

return hist, lat_bins, lon_bins
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not be better to work with sparse matrices and rasters? Because in general, the output will be very very sparse.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

density: bool (optional) default: False
If False it returns the number of samples in each bin. If True, returns the
probability density function at each bin computed as count_bin / tot_count.
filter_tracks: bool (optional) default: True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps calling this argument something like count_tracks would be more explicit?

Copy link
Collaborator Author

@NicolasColombi NicolasColombi Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, filter_track it's a bit cryptic...

"""

# ensure equal time step
# tc_track.equal_timestep(time_step_h=time_step)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be a reasonable check on the track's temporal resolution versus res to make sure all tracks will be counted (see my comment in the other thread).

Copy link
Collaborator Author

@NicolasColombi NicolasColombi Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, there should be a limit ratio between temporal resolution and grid cell resolution. Considering that the fastest recorded translational speed of a TC is around 110 km/h, a temporal resolution of 1h and grid res of 1° (~100km) should be the limit. In this case, I would make time_step a function of res and not the opposite.

If this slows down the code too much (probably not), given the low average speed of ~20km/h we could adjust it.

track.lat.values, track.lon.values, bins=[lat_bins, lon_bins], density=False
)
hist_new = csr_matrix(hist_new)
hist_new[hist_new > 1] = 1 if filter_tracks else hist_new
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not something like this (seems easier to read):

if filter_tracks:
    np.clip(hist_new, a_max=1, out=hist_new)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might need to do this before converting to a csr_matrix

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I like it, then I guess it doesn't really make sense to use sparse matrixes anymore. Since the core function np.histogram2d doesn't exist with sparse, and the later operations are better off with numpy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmh actually np.clip creates some issues later on with the testing, I'll keep it as it is.

Copy link
Member

@chahank chahank Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you keep it dense, then you should just make it clear in the docstring that this code does not work well at high resolution. It would be good to do some simple testing (res=1, 0.1, 0.01, 0.001) and see how much memory / time it takes.

Copy link
Collaborator

@bguillod bguillod Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then Maybe:

if filter_tracks:
    hist_new = np.minimum(hist_new, 1)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or use np.where(x>1,1,x)

Copy link
Collaborator Author

@NicolasColombi NicolasColombi Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you keep it dense, then you should just make it clear in the docstring that this code does not work well at high resolution. It would be good to do some simple testing (res=1, 0.1, 0.01, 0.001) and see how much memory / time it takes.

At the moment, using sparse (and numpy), with 300 tracks at global scale it takes:

1°: 141ms
0.1°: 9.1s
0.01°: 17 minutes
0.001°: ...

If, and only if, it makes sense to use this method at really high resolutions (e.g., > 0.1°) I will try to optimize it. But it might be worth it to have faster method, for larger track datasets.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know whether higher resolution is needed, but it would be good to mention in the docstring what kind of resolution is reasonable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my experience anything below 0.1 degrees is far too noisy, typically one uses something like 1, 5 or even 10 degrees.

Copy link
Collaborator

@bguillod bguillod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the overall idea a lot! We actually had implemented something quite similar to validate the new (still to be incorporated) TC stochastic tracks model. However we had done it a bit differently:

  • First, there was a parameter for the "minimum intensity" threshold to consider (something like min_ max_sustained_wind), such that you can plot the density of tracks exceeding a certain wind speed (or Saffir-Simpson category). This enabled us to find out where there where too many or too few tracks in general (when doing for example the difference between probabilistic and historical tracks) versus how more intense events were over-/under-estimated. I'd suggest add this feature here as well.
  • Ideally there is a valid unit to the "track density". To do so, we counted not the number of points (which is kind of the total duration of tracks being there) but instead counted the number of tracks (after identifying points within each grid cell, we counted the unique cyc_id values of these points), which leads to units or "number of tracks per year" if you then scale by the number of years in your dataset. You might want to consider doing the same, or at least enabling the user to choose between both approaches.
  • We had implemented that using lines, but it took ages to calculate (well, it was on a 10'000 years event set...), so using points might be good - and I am already relieved to see the validation of the new synthetic tracks model will be made easier thanks to this method 😄). In fact, there should then be a reasonable default for the temporal resolution of the tracks as a function of the raster cells size (i.e., if temporal resolution is too low there might be no point in a grid cell because it jumped from east to west of the cell but is not counted).
  • Finally, just to chime on the "track versus windfield" discussion in earlier comments: I think both are very valuable and complementary. One probably want to be able to validate the tracks first, and validate wind fields in a second step. This is the approach we had taken back then and it enabled tuning tracks algorithm without doing the more expensive windfield computation for each test.

@NicolasColombi NicolasColombi mentioned this pull request Feb 7, 2025
13 tasks
@NicolasColombi
Copy link
Collaborator Author

I like the overall idea a lot! We actually had implemented something quite similar to validate the new (still to be incorporated) TC stochastic tracks model. However we had done it a bit differently:

Good to hear! Is your version somewhere on a climada branch ?

@bguillod
Copy link
Collaborator

bguillod commented Feb 8, 2025

Good to hear! Is your version somewhere on a climada branch ?

No, it was in our code base at CelsiusPro so I don't have it anymore... @ChrisFairless probably has access to it but I think your approach is easier so I'm not sure you'd gain much (?). There was a class called TcTracksValidation or so which one could use to also plot the fields and their differences if I recall well.

The main main thing missing so far in your code is being able to filter points by minimum wind speed.

@NicolasColombi
Copy link
Collaborator Author

The main main thing missing so far in your code is being able to filter points by minimum wind speed.

I see, I have just added the feature 👍 now we can select above, below and in between wind speeds.

@NicolasColombi NicolasColombi marked this pull request as ready for review February 12, 2025 08:38
@NicolasColombi
Copy link
Collaborator Author

Before I start to do anything, after a second thought I think this PR should be only about the track density function. Plotting the density is pretty straight forward and having a single plotting function just for that might not be ideal, so we might want to remove it. Normalizing by the area does not make sense anymore, as you should normalize the density by time, not space (ex: n°storms per grid cell per year), even though the function compute_grid_cell_area might be useful for something else, I think it does not belongs here. I have written for myself an additional function: compute_windspeed_density. I propose the following: remove the plotting function and area function keeping only the track_density, and maybe add the compute_windspeed_density.
What do you think @chahank ?

@NicolasColombi
Copy link
Collaborator Author

Before I start to do anything, after a second thought I think this PR should be only about the track density function. Plotting the density is pretty straight forward and having a single plotting function just for that might not be ideal, so we might want to remove it. Normalizing by the area does not make sense anymore, as you should normalize the density by time, not space (ex: n°storms per grid cell per year), even though the function compute_grid_cell_area might be useful for something else, I think it does not belongs here. I have written for myself an additional function: compute_windspeed_density. I propose the following: remove the plotting function and area function keeping only the track_density, and maybe add the compute_windspeed_density. What do you think @chahank ?

any thoughts ? @chahank

@chahank
Copy link
Member

chahank commented Aug 28, 2025

Before I start to do anything, after a second thought I think this PR should be only about the track density function. Plotting the density is pretty straight forward and having a single plotting function just for that might not be ideal, so we might want to remove it. Normalizing by the area does not make sense anymore, as you should normalize the density by time, not space (ex: n°storms per grid cell per year), even though the function compute_grid_cell_area might be useful for something else, I think it does not belongs here. I have written for myself an additional function: compute_windspeed_density. I propose the following: remove the plotting function and area function keeping only the track_density, and maybe add the compute_windspeed_density. What do you think @chahank ?

I agree. There is already a grid cell area method in the util.coordinates module anyway. Why would you remove the plotting function now? I think you can keep it.

@NicolasColombi
Copy link
Collaborator Author

Thanks @chahank for the inputs! 🙌

summary:

  • removed grid area function
  • removed normalization
  • moved plotting function into TCTtracks
  • updated tests
  • added test for plotting function (just calling the plotting function, probably to simple for a test)
  • corrected the code with the suggestions you proposed

let me know if something is still missing.

@emanuel-schmid how does that look to you for the release ?

Comment on lines 2278 to 2280
add_features: dict
Dictionary of map features to add. Keys can be 'land', 'coastline', 'borders', and
'lakes'. Values are Booleans indicating whether to include each feature.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not simply add flags for the add_features? If this is about pleasing the linter (too many arguments), then flags are the lesser flaw. The problem with too many arguments is usually that they point to a design flaw. Plotting functions are often the exception.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point 😁


limit_ratio: float = (
1.12 * 1.1
) # record tc speed 112km/h -> 1.12°/h + 10% margin
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess having this limit hard coded is fine here. But out of curiosity: why 112km/h / 1.12 degree? is this about the expected value in tropical cyclones?

Copy link
Collaborator Author

@NicolasColombi NicolasColombi Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fastest recorded TC to my knowledge was traveling at 112km/h. If the track time step is 1h we want the grid cell to be at least 112km x 112km, so that the points in a track do not skip a grid cell. Which means there is a ratio between max speed, time step and resolution that should not be crossed (I added 10% to the max speed as margin). Here I assume no TC will ever be faster than 112km/h + 10% margin.

Around the equator 112km equates to roughly 1°, so 112km/h becomes 1°/h .

(I should have written 1° instead of 1.12°)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this requires the equal time step to be set to 1 hour right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if the chosen resolution is 1° the time step should be 1h or less, the code check if the ratio resolution / time_step is equal or greater than the limit ratio. If it is smaller, it throws an error: as either the resolution is to high or the time step to big. I'll try to write it more clearly.


import logging

import cartopy.crs as ccrs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all the changes to hazard.plot are not necessary. Are they maybe leftovers from something else?

Copy link
Collaborator Author

@NicolasColombi NicolasColombi Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I forgot to cancel them... (expect plot_fraction what was a merge conflict)

-------
hist_count: np.ndarray
2D matrix containing the the absolute count per grid cell of track point or the
normalized number of track points, depending on the norm parameter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see any norm parameter.

wind_max: float = None,
) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
"""Compute tropical cyclone track density. Before using this function,
apply the same temporal resolution to all tracks by calling :py:meth:`equal_timestep` on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you require more right? You require that the timestep is equal to one hour?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(See the other comment) You are probably right in pointing out that I should mention in the doc string the ratio between resolution and time step. Otherwise the only way to find out about it is getting an error.

Co-authored-by: Chahan M. Kropf <chahan.kropf@usys.ethz.ch>
@chahank
Copy link
Member

chahank commented Sep 23, 2025

You could add an example to the tutorial (https://github.com/CLIMADA-project/climada_python/blob/main/doc/tutorial/climada_hazard_TropCyclone.ipynb) if you want to be friendly with new users. This is up to you to decide ;).

@NicolasColombi
Copy link
Collaborator Author

You could add an example to the tutorial (https://github.com/CLIMADA-project/climada_python/blob/main/doc/tutorial/climada_hazard_TropCyclone.ipynb) if you want to be friendly with new users. This is up to you to decide ;).

Yes I will! I have anyway to review the TC tutorial, so I'll include these new additions 👍

@NicolasColombi
Copy link
Collaborator Author

@chahank @emanuel-schmid is there something left to do or shall we merge ?

@chahank
Copy link
Member

chahank commented Sep 24, 2025

@NicolasColombi There is a merge conflict (should be easy to resolve). Otherwise, please merge!

@NicolasColombi NicolasColombi merged commit ab7556a into develop Sep 24, 2025
19 checks passed
@emanuel-schmid emanuel-schmid deleted the feature/density_tracks branch December 11, 2025 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants