-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
dataframeenhancementImprove existing functionality or make things work betterImprove existing functionality or make things work better
Description
What happened:
Attempted to use a.fillna(b) on a Dask DataFrame a with another Dask DataFrame b provided as the fill value. This raised the error
ValueError: invalid fill value with a <class 'numpy.ndarray'>
What you expected to happen:
A new Dask DataFrame returned containing the non-nan values of a and nan values replaced by the equivalent values from b
Minimal Complete Verifiable Example:
import numpy as np
import pandas as pd
import dask.dataframe as dd
a=dd.from_pandas(pd.DataFrame({"A":[1,2,np.nan,4,5],"B":1}),npartitions=2)
b=dd.from_pandas(pd.DataFrame({"A":[1,2,3,4,5],"B":2}),npartitions=2)
a.fillna(b)Anything else we need to know?:
Environment:
- Dask version: '2022.02.1'
- Python version: 3.8.10
- Operating System: Windows 10
- Install method (conda, pip, source): pip
A workaround is to instead use
a.fillna(b.compute())but this seems sub-optimal for large datasets.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
dataframeenhancementImprove existing functionality or make things work betterImprove existing functionality or make things work better