Skip to content

Conversation

@emanuel-schmid
Copy link
Collaborator

@emanuel-schmid emanuel-schmid commented Jul 23, 2025

Changes proposed in this PR:

This PR is a pragmatic workaround of the problem described in #1055

PR Author Checklist

PR Reviewer Checklist

Copy link
Member

@chahank chahank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good changes, although some tests are failing now.

As a note, centroids are currently never supposed to be anything else than points, as the impact calculations cannot deal with other geometries in the centroids at the moment. I think it does not hurt to have the code ready.

Copy link
Member

@peanutfun peanutfun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for tackling this issue! I have some concerns regarding the readability and performance of the code, especially because the entire data needs to be copied. I also think the tests need to be adapted a bit. See my comments.

store.close()
xycols = []
wkbcols = []
store = pd.HDFStore(file_name, mode=mode, complevel=9)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For zlib, it seems like looks like high compression levels only slightly reduce the file size while costing much performance. A lower value seems more advisable to me. See https://www.pytables.org/usersguide/optimization.html#compression-issues

Suggested change
store = pd.HDFStore(file_name, mode=mode, complevel=9)
store = pd.HDFStore(file_name, mode=mode, complevel=3)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leave it as it is: in an arbitrary test, the cpu decrease was 0.4 seconds, about 15%, the size increase 2M, about 10%. From a $ point of view complevel 9 seems justified.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

15% decrease vs. 10% increase seems like an argument for a lower complevel, from my point of view. But I guess it's not that relevant. In case we run into some issues, we might consider making this a method kwarg in the future.

Comment on lines +914 to +915
for col in pandas_df.columns:
if str(pandas_df[col].dtype) == "geometry":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make clear that you do not want to iterate over all columns:

Suggested change
for col in pandas_df.columns:
if str(pandas_df[col].dtype) == "geometry":
for col in filter(lambda x: str(x.dtype) == "geometry", pandas_df.columns):

(Suggestion won't work because the following code needs to be indented less)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elegant suggestion - but I leave it as it is. it's more "climada style" like that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then please add a comment, what you call "climada style" confused me quite a bit 😕

Suggested change
for col in pandas_df.columns:
if str(pandas_df[col].dtype) == "geometry":
# Iterate over geometry columns (only)
for col in pandas_df.columns:
if str(pandas_df[col].dtype) == "geometry":

crs = metadata.get("crs")
gdf = gpd.GeoDataFrame(store["centroids"], crs=crs)
gdf = gpd.GeoDataFrame(store["centroids"])
for xycol in metadata.get("xy_columns", []):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add a test with multiple xy_columns/wkb_columns to be stored and read.

)
centroids_w.write_hdf5(tmpfile)
centroids_r = Centroids.from_hdf5(tmpfile)
self.assertTrue(centroids_w == centroids_r)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.assertTrue(centroids_w == centroids_r)
self.assertEqual(centroids_w, centroids_r)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this was actually done in purpose - the idea was to make sure the overridden equality operator does what it ought to do, regardless of what exactly happens inside assertEqual, about which I have no clue)

@emanuel-schmid emanuel-schmid merged commit 1ebd005 into develop Aug 27, 2025
19 checks passed
@emanuel-schmid emanuel-schmid deleted the feature/pickle_free_centroids_store branch August 27, 2025 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants