Skip to content

Dask Dataframe groupby-fillna for other value types #8922

@pavithraes

Description

@pavithraes

#8869 implements groupby-fillna for Dask DataFrame for value=scalar.

We can add functionality for value = dict, pandas Series, and pandas DataFrame similar to pandas.

This was considered out-of-scope for #8869 because pandas=1.3.5 create a multi-index for value=dict, which isn't consistent with the behavior for value=scalar.

Reproducer:

import numpy as np
import pandas as pd
import dask.dataframe as dd

df = pd.DataFrame({
    "A": [1, 1, 2, 2],
    "B": [3, 4, 3, 4],
    "C": [np.nan, 3, np.nan, np.nan],
    "D": [4, np.nan, 5, np.nan],
    "E": [6, np.nan, 7, np.nan],
})

d = {"C": 1, "D": 2, "E": 3}

df.groupby("A").fillna(d)
# Output:
# 
# B    C    D    E
# A                    
# 1 0  3  1.0  4.0  6.0
#   1  4  3.0  2.0  3.0
# 2 2  3  1.0  5.0  7.0
#   3  4  1.0  2.0  3.0

df.groupby("A").fillna(0)
# Output:
#
#    B    C    D    E
# 0  3  0.0  4.0  6.0
# 1  4  3.0  0.0  0.0
# 2  3  0.0  5.0  7.0
# 3  4  0.0  0.0  0.0

But note that for pandas=1.4.2, value=scalar and value=scalar produce consistent outputs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    dataframefeatureSomething is missingneeds attentionIt's been a while since this was pushed on. Needs attention from the owner or a maintainer.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions