Skip to content

Setting divisions on dask dataframe collapses rows.  #8802

@rajeee

Description

@rajeee

What happened: Trying to set divisions on a dask dataframe created from delayed objects results in rows being erased from the dataframe.

What you expected to happen: The divisions to work as expected

Minimal Complete Verifiable Example:

import pandas as pd
import dask.dataframe as dd
def get_df(bid):
    df = pd.DataFrame({"id": [bid, bid, bid], "value": [bid+0.1, bid+0.2, bid+0.3]})
    df = df.set_index('id')
    return df

ddf = dd.from_delayed([dask.delayed(get_df)(i) for i in range(5)])
ddf.divisions=[0, 9, 15]
ddf.compute()

The code outputs this:

image

Somehow, rest of the rows is lost. The output should have been like this:
image

Anything else we need to know?:
The two partitions of ddf contains the following:
image
image

Environment: MAC

  • Dask version: 2022.2.1
  • Python version: 3.9.7
  • Operating System: Mac Catalina
  • Install method (conda, pip, source): conda
Cluster Dump State:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions