Skip to content

[BUG] Failing to find dependencies in tuple of futures #4177

@rjzamora

Description

@rjzamora

What happened: After #4139 there seem to be a number of broken tests in custreamz CI. The failures seem to be related to the way streamz/custreamz is using futures. More specifically, the task graph contains tuples with futures objects that the new get_all_dependencies method is failing to search for dependencies.

What you expected to happen:
I would expect distributed to handle a tuple of futures as an argument.

Minimal Complete Verifiable Example:

For example, the following seems to work with 3e5b506, but fails on master:

import pandas as pd
from distributed import LocalCluster, Client

def _create(size):
    return pd.DataFrame({"a": range(size)})

client = Client(LocalCluster(n_workers=1))

x = client.submit(_create, 5)
df = client.submit(pd.concat, (x,))  # Note the tuple argument
df.result()

Traceback:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-0bc1dd2ca416> in <module>
      9 x = client.submit(_create, 5)
     10 df = client.submit(pd.concat, (x,))
---> 11 df.result()

~/workspace/cudf-0.16/distributed/distributed/client.py in result(self, timeout)
    224         if self.status == "error":
    225             typ, exc, tb = result
--> 226             raise exc.with_traceback(tb)
    227         elif self.status == "cancelled":
    228             raise result

/datasets/rzamora/miniconda3/envs/cudf_16/lib/python3.7/site-packages/pandas/core/reshape/concat.py in concat()
    282         verify_integrity=verify_integrity,
    283         copy=copy,
--> 284         sort=sort,
    285     )
    286 

/datasets/rzamora/miniconda3/envs/cudf_16/lib/python3.7/site-packages/pandas/core/reshape/concat.py in __init__()
    357                     "only Series and DataFrame objs are valid"
    358                 )
--> 359                 raise TypeError(msg)
    360 
    361             # consolidate

TypeError: cannot concatenate object of type '<class 'str'>'; only Series and DataFrame objs are valid

Note that the code works fine when a list of futures is used in place of a tuple: df = client.submit(pd.concat, [x])

cc @madsbk

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions