Skip to content

Scheduler TaskState objects should be unique, not hashed by key #7510

@gjoseph92

Description

@gjoseph92

Currently, two TaskState objects with the same key will hash and compare as equal. However, there should only ever be one TaskState object per logical task. Therefore, if there are two TaskState objects with the same key, they must refer to logically different tasks—such as the same key being re-submitted—and should not be equal, nor hash the same. They refer to different things.

Using the key as the hash causes errors like #7504.

This is a similar theme to:

@crusaderky already fixed the equivalent problem on the worker side in #6593.

Like there, I think we should simply remove __hash__ and __eq__ from TaskState on the scheduler. Then we'll automatically get what we want, where only TaskStates with the same id are equal:

User-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y).
https://docs.python.org/3/reference/datamodel.html#object.__hash__

See prior discussion in #6593 (comment), #6585 (comment).

Note that I could see an argument for instead including the recently-added run_id #7463 in the hash, in order to disambiguate between reruns of the same key. That would probably also fix things, for now, but I don't see the advantage of it. Since there should never be multiple TaskStates per task in the first place, what's the need for a custom __hash__ or __eq__ method? The default identity-based method is the simplest and most correct.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions