-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
Description
What happened: Trying to set divisions on a dask dataframe created from delayed objects results in rows being erased from the dataframe.
What you expected to happen: The divisions to work as expected
Minimal Complete Verifiable Example:
import pandas as pd
import dask.dataframe as dd
def get_df(bid):
df = pd.DataFrame({"id": [bid, bid, bid], "value": [bid+0.1, bid+0.2, bid+0.3]})
df = df.set_index('id')
return df
ddf = dd.from_delayed([dask.delayed(get_df)(i) for i in range(5)])
ddf.divisions=[0, 9, 15]
ddf.compute()The code outputs this:
Somehow, rest of the rows is lost. The output should have been like this:

Anything else we need to know?:
The two partitions of ddf contains the following:


Environment: MAC
- Dask version: 2022.2.1
- Python version: 3.9.7
- Operating System: Mac Catalina
- Install method (conda, pip, source): conda
Cluster Dump State:
Reactions are currently unavailable
