-
-
Notifications
You must be signed in to change notification settings - Fork 757
Description
When .persist(resources={...}) is called on a Dask DataFrame, the specified resources will be ignored for scheduling. In particular, no task in the corresponding dashboard pages will outline the resources. In my setup, the following example triggers the issue (using python 3.6 and distributed 1.28.1):
import pandas as pd
import dask.dataframe as dd
from dask.distributed import Client
def test_resources():
client = Client(n_workers=1, resources={"MyRes": 1000})
test_input_frame = dd.from_pandas(pd.DataFrame({"A": [1, 2], "B": [3, 4]}), npartitions=1)
test_output_frame = test_input_frame.apply(lambda row: row.sum(), axis=1)
output = test_output_frame.persist(resources={"MyRes": 1})
print(output)
# The task pages of the dashboard do not display the resource dependencies.
input("Press any key to exit...")
client.close()
if __name__ == '__main__':
test_resources()I quickly ran the debugger on the Dask distributed code and discovered that in client.Client._graph_to_futures(...), self._expand_resources(...) returns a dict where the keys are tuples identifying the task. When I manually convert the keys to their string representation, i.e. call:
resources = {tokey(k): v for k,v in resources.items()}after _expand_resources(...), the task dashboard will display the associated resources. (But I don't know whether converting the keys will have any other side effects)