Conversation
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks for your work on this @fjetter
This includes a breaking change for the WorkerPlugin since the transition functions will no longer receive strings. We can make this backwards compatible but eventually I'd like to transition this to the enums as well.
Thanks for flagging this. I'd like to add backwards compatibility / deprecation code similar to what was done in #3875
cc @crusaderky as this seems like the kind of this you might enjoy reviewing
distributed/worker.py
Outdated
| return self.name | ||
|
|
||
|
|
||
| WTSName = WorkerTaskStateName |
There was a problem hiding this comment.
Does removing Name from WorkerTaskStateName sufficiently reduce the need for this alias? I find using the full name much more readable
distributed/worker.py
Outdated
| no_value = "--no-value-sentinel--" | ||
|
|
||
|
|
||
| class WorkerTaskStateName(Enum): |
There was a problem hiding this comment.
This is shorter while, I think, conveying the same information
| class WorkerTaskStateName(Enum): | |
| class WorkerTaskState(Enum): |
There was a problem hiding this comment.
Since this same enum is going to be soon used in scheduler.py too, may I suggest moving it to another module (core.py?) and rename it to e.g. TaskStateStatus (imported as TSS everywhere)?
There was a problem hiding this comment.
WorkerTaskStatus
well, it's not a status but I don't have a strong opinion there
Since this same enum is going to be soon used
I'm not entirely conviced that this should be shared among scheduler and worker. What I like about using two distinct enums is that the readability improves if there are scheduler and worker states around. The set of allowed states is also different.
| v_args = v[1:] | ||
| v_state = v[0] |
There was a problem hiding this comment.
Not a big deal, but I'm curious why this change was needed
There was a problem hiding this comment.
looks cosmetic. I don't mind either way.
There was a problem hiding this comment.
mypy complains otherwise. below with type annotations
v : WTSName | tuple = recs.get(ts, (finish, *args))
v_state: WTSName
v_args: tuple
if isinstance(v, tuple):
# old; This will raise since the expression *v_args is of type list
v_state: WTSName, *v_args: list = v
# new
v_state: WTSName = v[0]
v_state: tuple = v[1:] # indexing a tuple this way yields a tuple
else:
v_state: WTSName, v_args: tuple = v, ()
distributed/stealing.py
Outdated
| "waiting", | ||
| WTSName.ready, | ||
| WTSName.constrained, | ||
| WTSName.waiting, |
There was a problem hiding this comment.
If they're enums now, you could perhaps use a Flag enum to encapsulate multiple states?
There was a problem hiding this comment.
I'm not sure about the benefit. Tasks themselves are not allowed to have multiple states. Therefore, there is the possibility to have a properly typed value assigned which is semantically nonsense / impossible. That's unlikely but doesn't encourage me in using it.
Therefore the only valid application will be when the rules engine verifies if a state is in a given set of states.
For this, it's basically just another syntax. Below the examples played out. set-like __contains__ still works so this will only be affecting the definition of the subsets.
In [1]: from enum import Enum, Flag, auto
In [2]: class TSSFlag(Flag):
...: fetch = auto()
...: flight = auto()
...: released = auto()
...: executing = auto()
...:
In [3]: class TSS(Enum):
...: fetch = auto()
...: flight = auto()
...: released = auto()
...: executing = auto()
...:
In [4]: to_be_gathered = {TSS.fetch, TSS.flight}
In [6]: to_be_gathered_flag = TSSFlag.fetch | TSSFlag.flight
In [7]: TSS.released in to_be_gathered
Out[7]: False
In [8]: TSS.flight in to_be_gathered
Out[8]: True
In [9]: TSSFlag.released in to_be_gathered_flag
Out[9]: False
In [10]: TSSFlag.flight in to_be_gathered_flag
Out[10]: True
In [11]: bool(TSSFlag.released & to_be_gathered_flag)
Out[11]: False
In [12]: bool(TSSFlag.flight & to_be_gathered_flag)
Out[12]: TrueI get that bitwise comparison is faster than hashing but this speed difference is entirely negligible for the worker and I'm not entirely sure if it is relevant for the scheduler.
However, I don't see any significant downsides. If anybody has a reason to do so, I'm open to it but I consider the set-only, plain enum thing to be more "pythonic"
There was a problem hiding this comment.
I guess it's mostly stylistic. I think checking membership of a set of values is the canonical use case for a flag enum over a regular enum. IIUC Flag has been around since py36 so I think it would be fairly "pythonic" to use it where you don't have to support older Pythons.
Even though it may be negligible now, if there's no downside to doing so I'd always be inclined to go with the more performant solution as it may save you in future. However, your type-checker won't save you from setting a TSSFlag attribute to a combined value (e.g. TSSFlag.fetch | TSSFlag.flight) so that's a fairly big downside 🤔
@jrbourbeau @fjetter Just a random thought: python 3.10 adds StrEnums which might be backwards compatible with this. You can get StrEnums in python 3.6+ with: https://github.com/irgeek/StrEnum I believe (enums sounds good to me either way). |
|
|
||
|
|
||
| async def wait_for_cancelled(key, dask_worker): | ||
| async def wait_for_cancelled(key: str, dask_worker: Worker): |
There was a problem hiding this comment.
| async def wait_for_cancelled(key: str, dask_worker: Worker): | |
| async def wait_for_cancelled(key: str, dask_worker: Worker) -> None: |
|
Blocked by and incorporates #5426 |
It's not the same thing. It's not about setting the proper value on the state attribute. If users modify anything on that attribute, they are entirely on their own since I couldn't think of many things which are more low level and internal than this attribute. I'm rather talking about the distributed/distributed/worker.py Line 2292 in 3afc670 Our WorkerPlugin interface currently defines strings. It's very easy for us to simply continue passing string values in there but it is impossible to infer whether the users migrated to anything new or not. We can either raise a deprecation/future warning every time (too much spam; I'm strongly against this) or not warn at all. We might also simply decide to just keep using strings for anything user facing. A similar problem exists for the transition log. Right now, I'm logging the enums but we could instead log the names which might offer a better UX (although I'm not sure if anybody other than myself is actually using the transition log :) ) |
I'm not aware that py3.10 is introducing a new enum type. Looking at https://docs.python.org/3/whatsnew/3.10.html enums are only mentioned in an example regarding the new AFAIK, mixed type enums can be easily created by subclassing, e.g. In [15]: class TSSStr(str, Enum):
...: fetch = "fetch"
...: flight = "flight"
...: released = "released"
...: executing = "executing"
...:
In [16]: TSSStr.fetch == "fetch"
Out[16]: True |
3f0600d to
2dadf51
Compare
2dadf51 to
bf894cc
Compare
| >>> client.register_worker_plugin(plugin) # doctest: +SKIP | ||
| """ | ||
|
|
||
| enum_task_state_names = False |
There was a problem hiding this comment.
I added a backwards compatibility toggle for the plugin. There is now a class attribute controlling this behaviour and it default to the old behaviour. the old behaviour will then issue a deprecation warning instructing the user how to migrate.
|
Summarizing
|
| ready = "ready" | ||
| constrained = "constrained" | ||
| executing = "executing" | ||
| long_running = "long-running" |
There was a problem hiding this comment.
This is going to cause headaches with name != value. Can we use _ in both cases?
There was a problem hiding this comment.
(may need to change on the scheduler side too)
There was a problem hiding this comment.
I kept it this way for compat purposes. The value is backwards compatible
| if hasattr(plugin, method_name): | ||
| if method_name == "release_key": | ||
| if ( | ||
| not getattr(plugin, "enum_task_state_names", False) |
There was a problem hiding this comment.
| not getattr(plugin, "enum_task_state_names", False) | |
| not plugin.enum_task_state_names |
I think getattr is unnecessary - aren't all customer plugins supposed to subclass WorkerPlugin?
There was a problem hiding this comment.
Historically I don't think we've enforced this. That's why, for example, we have hasattr checks like this
distributed/distributed/worker.py
Lines 1539 to 1543 in 1a2313c
Enforcing plugins to subclass WorkerPlugin doesn't sound like a bad idea, but would merit a deprecation cycle
bf894cc to
be5c9cb
Compare
|
|
Is there anything outstanding here? |
Nothing outstanding other than resolving the merge conflicts. So far, this wasn't a priority of mine |
This sets up an enum to be used as the worker task state name. The name of this enum is up for debate, I couldn't find a shorter one since, eventually, we'll have the same for the scheudler and both enums will occasionally be active in the same module. This is, in fact, one reason why this is useful.
This includes a breaking change for the WorkerPlugin since the transition functions will no longer receive strings. We can make this backwards compatible but eventually I'd like to transition this to the enums as well.
Builds on #5426