-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
dataframefeatureSomething is missingSomething is missingneeds attentionIt's been a while since this was pushed on. Needs attention from the owner or a maintainer.It's been a while since this was pushed on. Needs attention from the owner or a maintainer.
Description
#8869 implements groupby-fillna for Dask DataFrame for value=scalar.
We can add functionality for value = dict, pandas Series, and pandas DataFrame similar to pandas.
This was considered out-of-scope for #8869 because pandas=1.3.5 create a multi-index for value=dict, which isn't consistent with the behavior for value=scalar.
Reproducer:
import numpy as np
import pandas as pd
import dask.dataframe as dd
df = pd.DataFrame({
"A": [1, 1, 2, 2],
"B": [3, 4, 3, 4],
"C": [np.nan, 3, np.nan, np.nan],
"D": [4, np.nan, 5, np.nan],
"E": [6, np.nan, 7, np.nan],
})
d = {"C": 1, "D": 2, "E": 3}
df.groupby("A").fillna(d)
# Output:
#
# B C D E
# A
# 1 0 3 1.0 4.0 6.0
# 1 4 3.0 2.0 3.0
# 2 2 3 1.0 5.0 7.0
# 3 4 1.0 2.0 3.0
df.groupby("A").fillna(0)
# Output:
#
# B C D E
# 0 3 0.0 4.0 6.0
# 1 4 3.0 0.0 0.0
# 2 3 0.0 5.0 7.0
# 3 4 0.0 0.0 0.0But note that for pandas=1.4.2, value=scalar and value=scalar produce consistent outputs.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
dataframefeatureSomething is missingSomething is missingneeds attentionIt's been a while since this was pushed on. Needs attention from the owner or a maintainer.It's been a while since this was pushed on. Needs attention from the owner or a maintainer.