Skip to content

Conversation

@ValentinGebhart
Copy link
Collaborator

@ValentinGebhart ValentinGebhart commented Jul 30, 2024

Changes proposed in this PR:

  • In the plot function geo_im_from_array, NaN values in the data will be plotted in gray. Before, NaN value were not plotted (i.e. transparent), making them indistinguishable from plot regions for which there is no data (no centroids).
  • In the plot function plot_from_gdf, the colorbar with will be shown on a logarithmic scale if a) the gdf is about return periods or impacts, b) there are no zeros in the data, c) the span of the data's values are at least two orders of magnitude

PR Author Checklist

PR Reviewer Checklist

@peanutfun
Copy link
Member

@ValentinGebhart Thank you for that contribution. Can you share an example and compare the resulting plots before and after your changes?

@peanutfun peanutfun self-assigned this Jul 31, 2024
@ValentinGebhart
Copy link
Collaborator Author

ValentinGebhart commented Jul 31, 2024

@ValentinGebhart Thank you for that contribution. Can you share an example and compare the resulting plots before and after your changes?

This is an example of plotting the return periods of a hazard object where there are some NaNs (because the centroid had never seen the given threshold intensity, so the return period is given as NaN), and some centroids are removed (left bottom corner). This is the code:

import numpy as np
from climada.hazard import Hazard
from climada.util import HAZ_DEMO_H5 # CLIMADA's Python file
haz_tc_fl = Hazard.from_hdf5(HAZ_DEMO_H5) # Historic tropical cyclones in Florida from 1990 to 2004
haz_tc_fl.check() # Use always the check() method to see if the hazard has been loaded correctly

centroids_mask = np.array(
[ (i + j > 10) for j in range(50) for i in range(50)]
)
haz_tc_fl.centroids = haz_tc_fl.centroids.select(sel_cen=centroids_mask)
haz_tc_fl.intensity = haz_tc_fl.intensity[:, -2434:]

return_periods, label, column_label = haz_tc_fl.local_return_period([30, 40])

from climada.util.plot import plot_from_gdf
plot_from_gdf(return_periods, colorbar_name=label, title_subplots=column_label)

old plots
output

new plots
new

Note that if the value range of the hazard return periods was more than two orders of magnitude (without having zeros), the color scale would also be logarithmic in the new plots

Copy link
Member

@peanutfun peanutfun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much like the overall contribution, but I have to say I dislike the approach. Calling griddata again on basically the same data, casting to bool, plotting with a weird colormap...

I think the same thing can be achieved much easier, using all the tools of Matplotlib. You can set "bad" and "over/under" colors for a colormap. Choosing the right vmin should then give you the expected outcome with a single call to pcolormesh

# ...
if "norm" in kwargs:
    min_value = kwargs["norm"].vmin
    vmin = None  # We will pass norm
else:
    min_value = np.nanmin(array_im)
    vmin = kwargs.pop("vmin", min_value)

grid_im = griddata(
    (coord[:, 1], coord[:, 0]),
    array_im,
    (grid_x, grid_y),
    fill_value=min_value-1,  # Values outside the grid
)

# ...
cmap = plt.get_cmap(kwargs.pop("cmap", "viridis"))
cmap.set_bad("gray")  # For NaNs and infs
cmap.set_under("white", alpha=0)  # For values below vmin

axis.pcolormesh(
    grid_x - mid_lon,
    grid_y,
    np.squeeze(grid_im),
    transform=proj,
    cmap=cmap,
    vmin=vmin,
    **kwargs
)

Comment on lines 927 to 928
gdf = gdf[['geometry', *[col for col in gdf.columns if col != 'geometry']]]
gdf_values = gdf.values[:,1:].T
Copy link
Member

@peanutfun peanutfun Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html

Suggested change
gdf = gdf[['geometry', *[col for col in gdf.columns if col != 'geometry']]]
gdf_values = gdf.values[:,1:].T
gdf_values = gdf.drop(columns="geometry")

):
kwargs.update(
{'norm': mpl.colors.LogNorm(
vmin=gdf.values[:,1:].min(), vmax=gdf.values[:,1:].max()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vmin=gdf.values[:,1:].min(), vmax=gdf.values[:,1:].max()
vmin=gdf_values.min(), vmax=gdf_values.max()

@ValentinGebhart
Copy link
Collaborator Author

Thanks for the advice! I agree that the way you describe is easier. I implemented and tested it (example plots from above didn't change), with a small modification for the case of the log colorscale (min_value - 1 did not seem to work, so I used min_value/2).

Copy link
Member

@peanutfun peanutfun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this is looking much better now, thanks for the update! I have a few nitpicky suggestions still 🙈 We can merge once these are resolved!

@peanutfun peanutfun merged commit 2736e62 into develop Aug 9, 2024
@ValentinGebhart ValentinGebhart deleted the feature/plot_nan_and_log_scale branch August 9, 2024 13:17
@NicolasColombi NicolasColombi mentioned this pull request Apr 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants