Skip to content

Conversation

@peanutfun
Copy link
Member

@peanutfun peanutfun commented Jun 17, 2024

Changes proposed in this PR:

  • Call str on values if decoding strings from event_name fails, issue a warning in this case.
  • Throw error during writing if event_name is not strings.
  • Fix an issue where only the first value of event_name was checked.

This PR fixes #807

PR Author Checklist

PR Reviewer Checklist

@peanutfun peanutfun changed the base branch from main to develop June 17, 2024 08:32
@peanutfun peanutfun requested a review from sarah-hlsn June 17, 2024 08:32
@emanuel-schmid
Copy link
Collaborator

why don't we just force event names to be strings? If asstr() fails there must be other methods, e.g.,

event_name = [str(x) for x in file["event_name"][:]]

@peanutfun
Copy link
Member Author

@emanuel-schmid There was no decision on what to do in #807, so I came up with the least intrusive solution (from my perspective).

I am fine with your proposal, but it would imply that an Impact object with non-string event_name attribute is not cleanly "cycled": After writing to H5 and reading it again, the event_name will be a list of strings.

@emanuel-schmid
Copy link
Collaborator

Yeah, right. Sorry for coming late to the discussion - but...
Imho 1) a name ought to be a string 2) ambiguous data types are a constant source of sorrow.
Sooner or later someone will run into an issue because one of their hazards has event_name = 1, 2, 3 and another one '1', '2', '3'
If it's only about clean cycles (for not so clean data): I estimate the cost of losing "cyclability" lower than having to deal with this for ever.

@peanutfun
Copy link
Member Author

peanutfun commented Jun 17, 2024

@emanuel-schmid Thanks a lot for that perspective! This is exactly the kind of perspective I was hoping for in the discussion of #807. I would be fine with going the "string-only" route. However, there are some inconsistencies we then have to fix. In the tests for Impact, there are two instances where Impacts are instantiated with non-string event names:

@peanutfun
Copy link
Member Author

@chahank What do you think?

@chahank
Copy link
Member

chahank commented Jun 18, 2024

Thanks for the discussion @peanutfun and @emanuel-schmid : I here would also support your suggestion of the string only route. I think this makes things much clearer.

@emanuel-schmid
Copy link
Collaborator

@peanutfun Yes, Sorry again. 😳
thanks for pointing out the inconsistent tests. With regard to fixing #807 they have most likely no impact,
but they are sending contradictory signals and we should eventually put every instance of an event_name within quotes.
However - I think we can follow this up by fixing these when we come across them.

@peanutfun
Copy link
Member Author

@emanuel-schmid @chahank When we go string-only, we still have to clarify: Do we (try to) convert any event name data to strings when writing, when reading, or both times?

@emanuel-schmid
Copy link
Collaborator

emanuel-schmid commented Jun 18, 2024

Suggestion: When writing we simply assume they're strings, when reading we convert if necessary.
(When writing it's kind of too late to do anything about it.)

@peanutfun
Copy link
Member Author

(When writing it's kind of too late to do anything about it.)

Why? We can also call str on the elements before/while writing.

@emanuel-schmid
Copy link
Collaborator

Sure we can. But what would be the point? By then we supposedly have already been working daredevil on an "irregular object". We should have converted the non string event names to strings long ago. If we didn't, the conversion to strings just for the writing cannot save us as it doesn't make a difference anymore. Or does it?

@peanutfun
Copy link
Member Author

peanutfun commented Jun 21, 2024

I think it's mainly a question of conventions. Currently, Impact works fine with non-string event names. As a user, I would find it confusing if I can store such an object without issues and then, when reading it again, I might receive a warning or even an error that a string conversion is taking place or does not work. I think this should pop up as issue already during writing. It can simply be a warning like: "You are writing non-string event names, this is dangerous!".

@emanuel-schmid
Copy link
Collaborator

emanuel-schmid commented Jun 28, 2024

Hold on - when writing, the proper thing to do is fail if it is not a string, isn't it?

Because if we call str on the elements before writing the outcome may be something like 'x.y.Z object at 0x7f457d8faeb0>' which is likely not the desired one.

@peanutfun
Copy link
Member Author

@emanuel-schmid Great catch! Then we

  1. Throw an error when writing non-strings.
  2. Read strings or try to convert (in the latter case, issue a warning)

@peanutfun peanutfun changed the title Support other data types than strings for event_name in Impact.from_hdf5 Explicitly convert event_name to strings in Impact.from_hdf5 Jul 2, 2024
Copy link
Member

@chahank chahank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go.

Maybe we need an Issue to make sure that the event name is always a string in all hazard methods?

@peanutfun
Copy link
Member Author

Maybe we need an Issue to make sure that the event name is always a string in all hazard methods?

We would even need it for Impact, because this PR only fixes the HDF5 read/write methods. Elsewhere, the tests explicitly check if mixed-type event names work, see #894 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Impact read hdf5 fails if hazard event name is not a string

4 participants