Skip to content

Conversation

@spjuhel
Copy link
Collaborator

@spjuhel spjuhel commented Sep 26, 2024

Changes proposed in this PR:

  • Update all writers in Impact and Hazard to fail if event_name attribute contains elements that are not strings.
  • In Impact readers (from_excel and from_csv), check if event_name elements are all string, and warn user and convert to string otherwise, as reading relies on pandas here, checking and casting also use pandas. (note, that changes to from_hdf5 were done in Explicitly convert event_name to strings in Impact.from_hdf5 #894)
  • Add a _check_and_cast_attr method in HazardIO with the purpose of checking the type of both containers and elements of hazard attributes read by the different reader, and casting when required (warning the user in this case).
  • Use this method in from_raster, from_xarray_raster, from_excel and from_csv to effectively enforce correct types.
  • Update the test to be consistent with the changes

The idea behind implementing _check_and_cast_attr is to factorize the checking/casting and allow for possible additional test on the type of container/elements of other attributes in the future.

Currently, it only checks that (when read, i.e., before being set as an attribute of the object):

  • event_name is a list of str
  • event_id is a ndarray (not element type check)

This should be considered as a "medium term" temporary solution, before handling attributes such as event_id event_name in a DataFrame, which would greatly simplify reading, writing, type checking, etc.

This PR fixes #910 #950

PR Author Checklist

PR Reviewer Checklist

The rationale is to factorize the checking of hazard attribute in readers, for
possible future checks than event_name == str
`assert_almost_equals` tries to convert to float which failed for some reason.
In any case we want the event_names to be completely equal not almost.
@spjuhel spjuhel changed the base branch from main to develop September 26, 2024 09:07
@spjuhel spjuhel self-assigned this Sep 26, 2024
@spjuhel
Copy link
Collaborator Author

spjuhel commented Sep 26, 2024

Note that I still need to check/decide if additional tests that make sure a warning is raised, or writing fails in the corresponding case is required.

Any discussion on this will be appreciated :)

@chahank
Copy link
Member

chahank commented Dec 10, 2024

@spjuhel Any plan with this PR? Is this ready for review? In this case, please assign a reviewer. Thanks!

@spjuhel
Copy link
Collaborator Author

spjuhel commented Dec 12, 2024

Just asked @emanuel-schmid to review as it is slightly technical.

I did not add any tests, still unsure whether this would be useful...

Copy link
Collaborator

@emanuel-schmid emanuel-schmid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😎 II like it. It's clean, neat and solves the problem in the right way.

Comment on lines +1007 to +1008
## This should probably be defined as a CONSTANT?
attrs_to_check = {"event_name": (list, str), "event_id": (np.ndarray, None)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 agreed. CONSTANT dict seems sensible.


# Perform type checking and casting of elements
if isinstance(attr_value, (list, np.ndarray)):
if not all(isinstance(val, expected_dtype) for val in attr_value):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 good choice. even though in case of attr_value being a nd.array, np.issubtye(attr_value, expected_type) would be infinitely faster but make the code more complex with nothing to be gained but a couple of seconds if the array is huge.

@spjuhel spjuhel merged commit 06ae9e4 into develop Mar 3, 2025
15 checks passed
spjuhel added a commit that referenced this pull request Mar 3, 2025
@emanuel-schmid emanuel-schmid deleted the feature/event_name_handling branch March 19, 2025 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ensure proper handling of event_name attribute in Hazard and Impact

4 participants