-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Description
Run on dask+distributed main
# Correct behavior w/ local scheduler
ddf = dd.from_pandas(pd.DataFrame({"a": range(12)}), npartitions=2)
print(ddf.map_partitions(lambda x, partition_info=None: partition_info).compute(scheduler="threads"))
0 {'number': 0, 'division': 0}
1 {'number': 1, 'division': 6}
dtype: object
# Incorrect behavior w/ distributed
import distributed; client = distributed.Client()
ddf = dd.from_pandas(pd.DataFrame({"a": range(12)}), npartitions=2)
print(ddf.map_partitions(lambda x, partition_info=None: partition_info).compute())
0 {'number': 0, 'division': 0}
1 {'number': 0, 'division': 0}
dtype: object
I also checked and test_map_partitions_partition_info from #6776 fails when using the distributed scheduler.
Dug around a little bit but nothing obviously wrong jumped out at me. cc @jsignell @jrbourbeau @kumarprabhu1988 from that PR.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels