Skip to content

fillna with dask DataFrame raises ValueError #8823

@tangoed2whiskey

Description

@tangoed2whiskey

What happened:
Attempted to use a.fillna(b) on a Dask DataFrame a with another Dask DataFrame b provided as the fill value. This raised the error
ValueError: invalid fill value with a <class 'numpy.ndarray'>

What you expected to happen:
A new Dask DataFrame returned containing the non-nan values of a and nan values replaced by the equivalent values from b

Minimal Complete Verifiable Example:

import numpy as np
import pandas as pd
import dask.dataframe as dd

a=dd.from_pandas(pd.DataFrame({"A":[1,2,np.nan,4,5],"B":1}),npartitions=2)
b=dd.from_pandas(pd.DataFrame({"A":[1,2,3,4,5],"B":2}),npartitions=2)
a.fillna(b)

Anything else we need to know?:

Environment:

  • Dask version: '2022.02.1'
  • Python version: 3.8.10
  • Operating System: Windows 10
  • Install method (conda, pip, source): pip

A workaround is to instead use

a.fillna(b.compute())

but this seems sub-optimal for large datasets.

Metadata

Metadata

Assignees

Labels

dataframeenhancementImprove existing functionality or make things work better

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions