Scheduler `TaskState` objects should be unique, not hashed by key

Currently, two `TaskState` objects with the same key will hash and compare as equal. However, there should only ever be one `TaskState` object per logical task. Therefore, if there are two `TaskState` objects with the same key, they must refer to logically different tasks—such as the same key being re-submitted—and should _not_ be equal, nor hash the same. They refer to different things.

Using the key as the hash causes errors like https://github.com/dask/distributed/issues/7504.

This is a similar theme to:
- https://github.com/dask/distributed/issues/7356
- https://github.com/dask/distributed/issues/6392

@crusaderky already fixed the equivalent problem on the worker side in https://github.com/dask/distributed/pull/6593.

Like there, I think **we should simply remove `__hash__` and `__eq__` from `TaskState` on the scheduler**. Then we'll automatically get what we want, where only `TaskState`s with the same `id` are equal:
> User-defined classes have [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) and [`__hash__()`](https://docs.python.org/3/reference/datamodel.html#object.__hash__) methods by default; with them, all objects compare unequal (except with themselves) and `x.__hash__()` returns an appropriate value such that `x == y` implies both that `x is y` and `hash(x) == hash(y)`.
https://docs.python.org/3/reference/datamodel.html#object.__hash__

See prior discussion in https://github.com/dask/distributed/pull/6593#discussion_r902943408, https://github.com/dask/distributed/pull/6585#discussion_r901672210.

Note that I could see an argument for instead including the recently-added `run_id` https://github.com/dask/distributed/pull/7463 in the hash, in order to disambiguate between reruns of the same key. That would probably also fix things, for now, but I don't see the advantage of it. Since there should never be multiple `TaskState`s per task in the first place, what's the need for a custom `__hash__` or `__eq__` method? The default identity-based method is the simplest and most correct.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scheduler `TaskState` objects should be unique, not hashed by key #7510

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Scheduler TaskState objects should be unique, not hashed by key #7510

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Scheduler `TaskState` objects should be unique, not hashed by key #7510