-
-
Notifications
You must be signed in to change notification settings - Fork 757
Description
Currently, two TaskState objects with the same key will hash and compare as equal. However, there should only ever be one TaskState object per logical task. Therefore, if there are two TaskState objects with the same key, they must refer to logically different tasks—such as the same key being re-submitted—and should not be equal, nor hash the same. They refer to different things.
Using the key as the hash causes errors like #7504.
This is a similar theme to:
- Issues with tasks completing on workers after being released and re-submitted #7356
- Worker addresses are treated as unique identifiers, but may not be #6392
@crusaderky already fixed the equivalent problem on the worker side in #6593.
Like there, I think we should simply remove __hash__ and __eq__ from TaskState on the scheduler. Then we'll automatically get what we want, where only TaskStates with the same id are equal:
User-defined classes have
__eq__()and__hash__()methods by default; with them, all objects compare unequal (except with themselves) andx.__hash__()returns an appropriate value such thatx == yimplies both thatx is yandhash(x) == hash(y).
https://docs.python.org/3/reference/datamodel.html#object.__hash__
See prior discussion in #6593 (comment), #6585 (comment).
Note that I could see an argument for instead including the recently-added run_id #7463 in the hash, in order to disambiguate between reruns of the same key. That would probably also fix things, for now, but I don't see the advantage of it. Since there should never be multiple TaskStates per task in the first place, what's the need for a custom __hash__ or __eq__ method? The default identity-based method is the simplest and most correct.