-
-
Notifications
You must be signed in to change notification settings - Fork 757
Description
Apparently in a quiet cluster we end up sending new tasks to the same worker repeatedly. So probably the following test would fail:
@gen_cluster(client=True)
def test_round_robin(c, s, a, b):
await c.submit(inc, 1)
await c.submit(inc, 2)
await c.submit(inc, 3)
assert a.transition_log and b.transition_logThis outcome is determined here:
distributed/distributed/scheduler.py
Lines 2122 to 2136 in bf9ddab
| if ts._dependencies or valid_workers is not None: | |
| ws = decide_worker( | |
| ts, | |
| self._workers_dv.values(), | |
| valid_workers, | |
| partial(self.worker_objective, ts), | |
| ) | |
| else: | |
| worker_pool = self._idle or self._workers | |
| worker_pool_dv = cast(dict, worker_pool) | |
| n_workers: Py_ssize_t = len(worker_pool_dv) | |
| if n_workers < 20: # smart but linear in small case | |
| ws = min(worker_pool.values(), key=operator.attrgetter("occupancy")) | |
| else: # dumb but fast in large case | |
| ws = worker_pool.values()[self._n_tasks % n_workers] |
It looks like when there are no dependencies and we have only a few workers we currently choose the worker with minimum occupancy. In a quiet cluster all workers have zero occupancy, so probably we're getting whatever Python uses to break a tie in this setting.
In the case where the occupancy is zero we might do something like a round-robin (this is done just below in the case where we have greater than 20 workers) among the worker pool for as long as that worker has zero occupancy.
Reported in conversation by @crusaderky