-
Notifications
You must be signed in to change notification settings - Fork 149
Open
Labels
accepting pull requestContribute by raising a pull request to resolve this issue!Contribute by raising a pull request to resolve this issue!performance
Description
Since the refactoring of Centroids in Climada 5.0, their coordinates are stored as a GeoDataFrame.geometyr and not as numpy arrays, lat/lon anymore.
This has the following consequences:
- the files produced by
write_hdf5are much bigger than they used to be. Here's an example for Centroids in a grid of size 5'760'000:
| geometry saved as | uncompressed | compressed |
|---|---|---|
shapely.Point (current) |
210M | 167M |
| wkb, byte array (planned) | 177M | 134M |
| x and y (no geometry) | 134M | 15M |
- when reading the hdf files with pickled
Points, the risk for exceeding memory limitations is quite high. With a memory limit of 4G, I have not been able to read them without killing the kernel.
It has already been the plan to store Centroids.gdf geometries in wkb format, like the ones in Exposures.gdf.
This would alleviate the problem somewhat: lower memory requirements, smaller files (-20%)
However: if we converted the geometry column to x/y columns prior of storing and vice versa after reading, the files would be 90% smaller, and reading/writing faster.
This only works if the Centroids are really points and not another type of geometry.
Metadata
Metadata
Assignees
Labels
accepting pull requestContribute by raising a pull request to resolve this issue!Contribute by raising a pull request to resolve this issue!performance