Skip to content

Computation deadlocks due to worker rapidly running out of memory instead of spilling #6110

@fjetter

Description

@fjetter

The below script is pretty reliably triggering deadlocks. I'm sure this can be reduced further but I haven't had time to do so, yet.

import coiled.v2
from distributed import Client
cluster = coiled.v2.Cluster(
    n_workers=20
)
client = Client(cluster)

from distributed import Client
from dask.datasets import timeseries
ddf = timeseries(
    "2020",
    "2025",
    partition_freq='2w',
)
ddf2 = timeseries(
    "2020",
    "2023",
    partition_freq='2w',
)
def slowident(df):
    import random
    import time
    time.sleep(random.randint(1, 5))
    return df
               
while True:
    client.restart()
    demo1 = ddf.map_partitions(slowident)
    (demo1.x + demo1.y).mean().compute()

    demo2 = ddf.merge(ddf2)
    demo2 = demo2.map_partitions(slowident)
    (demo2.x + demo2.y).mean().compute()

We could confirm that version 2022.1.1 is not affected by this but it appears that all follow up versions might be affected (haven't tested all of them, can definitely confirm for 2022.4.0)

Metadata

Metadata

Assignees

No one assigned

    Labels

    deadlockThe cluster appears to not make any progress

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions