Skip to content

worker-saturation impacts balancing in work-stealing #7085

@hendrikmakait

Description

@hendrikmakait

When worker-saturation is not inf, then workers are only classified as idle if they are not full:

if (
self.is_unoccupied(ws, occ, p)
if math.isinf(self.WORKER_SATURATION)
else not _worker_full(ws, self.WORKER_SATURATION)
):

While this behavior is desired for withholding root-tasks (it was introduced in #6614), work-stealing also relies on the classification of idle tasks to identify thieves. Limiting this to workers that are not saturated according to worker-saturation delays balancing decisions until workers are almost out of work and reduces our ability to interleave computation of remaining tasks with gathering dependencies of stolen ones.

Reproducer
Add the following test case to test_steal.py

@pytest.mark.parametrize("queue", [True, False])
@pytest.mark.parametrize("recompute_saturation", [True, False])
@pytest.mark.parametrize(
    "inp,expected",
    [
        (
            [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0]],
            [[0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0]],
        ),  # balance many tasks
    ],
)
def test_balance_interacts_with_worker_saturation(
    inp, expected, queue, recompute_saturation
):
    async def test_balance_(*args, **kwargs):
        await assert_balanced(inp, expected, recompute_saturation, *args, **kwargs)

    config = {
        "distributed.scheduler.default-task-durations": {str(i): 1 for i in range(10)},
        "distributed.scheduler.worker-saturation": 1.0 if queue else float("inf"),
    }
    gen_cluster(client=True, nthreads=[("", 1)] * len(inp), config=config)(
        test_balance_
    )()
FAILED distributed/tests/test_steal.py::test_balance_interacts_with_worker_saturation[inp0-expected0-True-True] - Exception: Expected: [[0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0]]; got: [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0]]
FAILED distributed/tests/test_steal.py::test_balance_interacts_with_worker_saturation[inp0-expected0-False-True] - Exception: Expected: [[0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0]]; got: [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0]]

cc @gjoseph92

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions