Skip to content

Persist-call ignores assigned worker resources #2716

@Spiegel0

Description

@Spiegel0

When .persist(resources={...}) is called on a Dask DataFrame, the specified resources will be ignored for scheduling. In particular, no task in the corresponding dashboard pages will outline the resources. In my setup, the following example triggers the issue (using python 3.6 and distributed 1.28.1):

import pandas as pd

import dask.dataframe as dd
from dask.distributed import Client


def test_resources():
    client = Client(n_workers=1, resources={"MyRes": 1000})

    test_input_frame = dd.from_pandas(pd.DataFrame({"A": [1, 2], "B": [3, 4]}), npartitions=1)
    test_output_frame = test_input_frame.apply(lambda row: row.sum(), axis=1)

    output = test_output_frame.persist(resources={"MyRes": 1})
    print(output)

    # The task pages of the dashboard do not display the resource dependencies.
    input("Press any key to exit...")
    client.close()


if __name__ == '__main__':
    test_resources()

I quickly ran the debugger on the Dask distributed code and discovered that in client.Client._graph_to_futures(...), self._expand_resources(...) returns a dict where the keys are tuples identifying the task. When I manually convert the keys to their string representation, i.e. call:

resources = {tokey(k): v for k,v in resources.items()}

after _expand_resources(...), the task dashboard will display the associated resources. (But I don't know whether converting the keys will have any other side effects)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions