-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
Description
What happened:
Loading a dataframe seemingly returned a tuple, rather than a dask.dataframe, as an exception was thrown:
AttributeError: 'tuple' object has no attribute 'sample'
What you expected to happen:
I expected for the code below to return a pandas.DataFrame with the correlations that I'm looking for!
Minimal Complete Verifiable Example:
import dask.dataframe as daskdf
from dask.distributed import Client
client = Client(memory_limit='4GB', processes=False)
raw_df = daskdf.read_csv(os.path.join(input_file_path, '*.csv'))
df = raw_df.sample(frac=0.01).drop(['gaugeid', 'time', 'input', 'labels'], 1)
correlations = df.corr().compute()Anything else we need to know?:
The example runs fine on my local machine (Windows 10, Dask 2021.1.1, Python 3.8.5), it is just failing when run in containerised compute provided by Azure.
The full traceback is here:
Traceback (most recent call last):
File "correlation_analysis.py", line 43, in <module>
correlations = df.corr().compute()
File "/azureml-envs/azureml_datastore/lib/python3.8/site-packages/dask/base.py", line 285, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/azureml-envs/azureml_datastore/lib/python3.8/site-packages/dask/base.py", line 567, in compute
results = schedule(dsk, keys, **kwargs)
File "/azureml-envs/azureml_datastore/lib/python3.8/site-packages/distributed/client.py", line 2673, in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
File "/azureml-envs/azureml_datastore/lib/python3.8/site-packages/distributed/client.py", line 1982, in gather
return self.sync(
File "/azureml-envs/azureml_datastore/lib/python3.8/site-packages/distributed/client.py", line 853, in sync
return sync(
File "/azureml-envs/azureml_datastore/lib/python3.8/site-packages/distributed/utils.py", line 354, in sync
raise exc.with_traceback(tb)
File "/azureml-envs/azureml_datastore/lib/python3.8/site-packages/distributed/utils.py", line 337, in f
result[0] = yield future
File "/azureml-envs/azureml_datastore/lib/python3.8/site-packages/tornado/gen.py", line 762, in run
value = future.result()
File "/azureml-envs/azureml_datastore/lib/python3.8/site-packages/distributed/client.py", line 1847, in _gather
raise exception.with_traceback(traceback)
File "/azureml-envs/azureml_datastore/lib/python3.8/site-packages/dask/dataframe/methods.py", line 352, in sample
return df.sample(random_state=rs, frac=frac, replace=replace) if len(df) > 0 else df
AttributeError: 'tuple' object has no attribute 'sample'
Environment:
- Dask version: 2021.6.0
- Python version: 3.8.1
- Operating System: Linux
- Install method (conda, pip, source): conda
Reactions are currently unavailable