Skip to content

Repeatedly use the same worker on first task #4637

@mrocklin

Description

@mrocklin

Apparently in a quiet cluster we end up sending new tasks to the same worker repeatedly. So probably the following test would fail:

@gen_cluster(client=True)
def test_round_robin(c, s, a, b):
    await c.submit(inc, 1)
    await c.submit(inc, 2)
    await c.submit(inc, 3)
    assert a.transition_log and b.transition_log

This outcome is determined here:

if ts._dependencies or valid_workers is not None:
ws = decide_worker(
ts,
self._workers_dv.values(),
valid_workers,
partial(self.worker_objective, ts),
)
else:
worker_pool = self._idle or self._workers
worker_pool_dv = cast(dict, worker_pool)
n_workers: Py_ssize_t = len(worker_pool_dv)
if n_workers < 20: # smart but linear in small case
ws = min(worker_pool.values(), key=operator.attrgetter("occupancy"))
else: # dumb but fast in large case
ws = worker_pool.values()[self._n_tasks % n_workers]

It looks like when there are no dependencies and we have only a few workers we currently choose the worker with minimum occupancy. In a quiet cluster all workers have zero occupancy, so probably we're getting whatever Python uses to break a tie in this setting.

In the case where the occupancy is zero we might do something like a round-robin (this is done just below in the case where we have greater than 20 workers) among the worker pool for as long as that worker has zero occupancy.

Reported in conversation by @crusaderky

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions