-
Notifications
You must be signed in to change notification settings - Fork 149
HDF5 file IO for TCTracks #349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
chahank
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent addition!
Note for future: such a method would also be very useful for other climada objects, in particular the Impact and CostBenefit objects.
|
Thanks for the quick feedback! I think I addressed all of your comments in my latest commit. |
|
I have to mark this as WIP because I found that there are alternative implementations (data formats) and I'm not sure which one to choose:
When gzipping the files in the format (2), the size problem is more than solved. However, it's not possible to read and write directly to/from gzipped hdf5 files when using |
|
@chahank I'm sorry, but I have to ask again for your review. I didn't change the API or the tests, but I changed the file format and the implementation. The produced HDF5-file is now completely NetCDF4-compliant and can be read with external NetCDF-tools like However, users that have problems with disk space can still manually gzip their files which will usually save 90% of the disk space. [*] For comparison, when storing the 3000 tracks in separate NetCDF files using the |
|
Thanks @tovogt . I went through the code, and made a few remarks. I think having a method that makes use of the files easier and compatible at the cost of memory is the right choice. And, as you said, for user where it is critical they may use extra compressing methods. |
|
Looks good to go to me, thanks for the updates! |
For Hazard and Exposure objects, CLIMADA comes with handy methods to read and write to/from HDF5 files. For
TCTracksobjects, users have to either use pickle (whick has its own issues) or thewrite_netcdffunctionality.The problem with the NetCDF IO is that all tracks are stored in separate files. On many setups, especially with network file systems, the access times for many small files can be extremely bad.
That's why this PR adds IO methods
write_hdfandfrom_hdfforTCTracks, in line with the corresponding functionality for Hazard and Exposure objects. All tracks are stored in a single hdf5 file.