-
Notifications
You must be signed in to change notification settings - Fork 1k
Closed
Labels
PythonAffects Python cuDF API.Affects Python cuDF API.bugSomething isn't workingSomething isn't workingdaskDask issueDask issue
Description
When dask_cudf is imported and a user calls apply on a pandas backed dask dataframe, dask-cudf alters the metadata creation step to use cudf if metadata is supplied. This can cause confusing downstream errors as the user will unexpectedly be operating on the GPU. If metadata is not explicitly supplied, Dask will continue to use pandas as expected. This does not happen if dask_cudf is not imported.
import pandas as pd
import dask.dataframe as dd
import dask_cudf
df = pd.DataFrame({'a': [3,4], 'b': [1, 2]})
ddf = dd.from_pandas(df, npartitions=1)
emb = ddf['a'].apply(pd.Series, meta={'c0': 'int64', 'c1': 'int64'})
print(type(emb._meta))
print(type(emb))
<class 'cudf.core.dataframe.DataFrame'>
<class 'dask_cudf.core.DataFrame'>import pandas as pd
import dask.dataframe as dd
import dask_cudf
df = pd.DataFrame({'a': [3,4], 'b': [1, 2]})
ddf = dd.from_pandas(df, npartitions=1)
emb = ddf['a'].apply(pd.Series)
print(type(emb._meta))
print(type(emb))
<class 'pandas.core.frame.DataFrame'>
<class 'dask.dataframe.core.DataFrame'>
/raid/nicholasb/miniconda3/envs/rapids-gpubdb-20210413/lib/python3.8/site-packages/dask/dataframe/core.py:3519: UserWarning:
You did not provide metadata, so Dask is running your function on a small dataset to guess output types. It is possible that Dask will guess incorrectly.
To provide an explicit output types or to silence this message, please provide the `meta=` keyword, as described in the map or apply function that you are using.
Before: .apply(func)
After: .apply(func, meta={0: 'int64'})
warnings.warn(meta_warning(meta))conda list | grep "rapids\|dask\|pandas\|arrow\|numpy\|scipy"
# packages in environment at /raid/nicholasb/miniconda3/envs/rapids-gpubdb-20210413:
arrow-cpp 1.0.1 py38h9018dff_36_cuda conda-forge
arrow-cpp-proc 3.0.0 cuda conda-forge
cudf 0.20.0a210413 cuda_11.2_py38_gd6479a20d8_137 rapidsai-nightly
cuml 0.20.0a210413 cuda11.2_py38_g5f61a3519_74 rapidsai-nightly
dask 2021.4.0 pyhd8ed1ab_0 conda-forge
dask-core 2021.4.0 pyhd8ed1ab_0 conda-forge
dask-cuda 0.20.0a210413 py38_9 rapidsai-nightly
dask-cudf 0.20.0a210413 py38_gd6479a20d8_137 rapidsai-nightly
libcudf 0.20.0a210413 cuda11.2_gd6479a20d8_137 rapidsai-nightly
libcuml 0.20.0a210413 cuda11.2_g5f61a3519_74 rapidsai-nightly
libcumlprims 0.20.0a210408 cuda11.2_g7f19636_2 rapidsai-nightly
librmm 0.20.0a210413 cuda11.2_g80bfeb2_13 rapidsai-nightly
numpy 1.20.2 py38h9894fe3_0 conda-forge
pandas 1.2.4 py38h1abd341_0 conda-forge
pyarrow 1.0.1 py38hb53058b_36_cuda conda-forge
rmm 0.20.0a210413 cuda_11.2_py38_g80bfeb2_13 rapidsai-nightly
scipy 1.6.2 py38h7b17777_0 conda-forge
ucx 1.9.0+gcd9efd3 cuda11.2_0 rapidsai-nightly
ucx-proc 1.0.0 gpu rapidsai-nightly
ucx-py 0.20.0a210413 py38_gcd9efd3_2 rapidsai-nightly
cc @viclafargue
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
PythonAffects Python cuDF API.Affects Python cuDF API.bugSomething isn't workingSomething isn't workingdaskDask issueDask issue