-
Notifications
You must be signed in to change notification settings - Fork 149
Checks in readers/writers to enforce correct types for event_name and event_id container and elements
#951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The rationale is to factorize the checking of hazard attribute in readers, for possible future checks than event_name == str
`assert_almost_equals` tries to convert to float which failed for some reason. In any case we want the event_names to be completely equal not almost.
|
Note that I still need to check/decide if additional tests that make sure a warning is raised, or writing fails in the corresponding case is required. Any discussion on this will be appreciated :) |
|
@spjuhel Any plan with this PR? Is this ready for review? In this case, please assign a reviewer. Thanks! |
|
Just asked @emanuel-schmid to review as it is slightly technical. I did not add any tests, still unsure whether this would be useful... |
emanuel-schmid
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😎 II like it. It's clean, neat and solves the problem in the right way.
| ## This should probably be defined as a CONSTANT? | ||
| attrs_to_check = {"event_name": (list, str), "event_id": (np.ndarray, None)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 agreed. CONSTANT dict seems sensible.
|
|
||
| # Perform type checking and casting of elements | ||
| if isinstance(attr_value, (list, np.ndarray)): | ||
| if not all(isinstance(val, expected_dtype) for val in attr_value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 good choice. even though in case of attr_value being a nd.array, np.issubtye(attr_value, expected_type) would be infinitely faster but make the code more complex with nothing to be gained but a couple of seconds if the array is huge.
Changes proposed in this PR:
ImpactandHazardto fail ifevent_nameattribute contains elements that are not strings.Impactreaders (from_excelandfrom_csv), check ifevent_nameelements are all string, and warn user and convert to string otherwise, as reading relies on pandas here, checking and casting also use pandas. (note, that changes tofrom_hdf5were done in Explicitly convertevent_nameto strings inImpact.from_hdf5#894)_check_and_cast_attrmethod inHazardIOwith the purpose of checking the type of both containers and elements of hazard attributes read by the different reader, and casting when required (warning the user in this case).from_raster,from_xarray_raster,from_excelandfrom_csvto effectively enforce correct types.The idea behind implementing
_check_and_cast_attris to factorize the checking/casting and allow for possible additional test on the type of container/elements of other attributes in the future.Currently, it only checks that (when read, i.e., before being set as an attribute of the object):
event_nameis a list of strevent_idis a ndarray (not element type check)This should be considered as a "medium term" temporary solution, before handling attributes such as
event_idevent_namein a DataFrame, which would greatly simplify reading, writing, type checking, etc.This PR fixes #910 #950
PR Author Checklist
develop)PR Reviewer Checklist