Hi,
Following why align_partitions() use force=True? on discourse:
When passing severals dask dataframes to map_partitions(), it looks like the underling align_partitions() force repartitioning every dd even if only a single one needs it.
This generates non-necessary graph nodes that slow down packing/sending/unpacking/schedule stages.
dds = [dask.datasets.timeseries() for i in range(5)]
# Only the latest dd partitionning will differ
dds[-1] = dds[-1].repartition(npartitions=1)
def func(*args):
return args[0]
dd = dask.dataframe.map_partitions(func, *dds)
dd.dask
