-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
dataframeenhancementImprove existing functionality or make things work betterImprove existing functionality or make things work bettergood first issueClearly described and easy to accomplish. Good for beginners to the project.Clearly described and easy to accomplish. Good for beginners to the project.
Description
I was trying to debug a DataFrameGroupBy operation and naively tried to .compute() it. This raised a KeyError that the compute column was missing.
import dask
df = dask.datasets.timeseries()
df.groupby("name").compute()
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
...
-> 2003 raise AttributeError(e) from e
AttributeError: 'Column not found: compute'In hindsight, this is clearly not the right thing to do. On the other hand, as a naive Pandas-familiar user, I was trying to "compute" the Pandas GroupBy object so that I could inspect it. I think it would be nice to have a more useful error in this case. However, since dask.dataframe.DataFrameGroupBy re-raises the KeyError as an AttributeError, I'm not sure what the appropriate resolution should be 😄
Full traceback 👇
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/GitHub/dask/dask/dataframe/groupby.py:2001, in DataFrameGroupBy.__getattr__(self, key)
2000 try:
-> 2001 return self[key]
2002 except KeyError as e:
File ~/GitHub/dask/dask/dataframe/groupby.py:1987, in DataFrameGroupBy.__getitem__(self, key)
1986 # error is raised from pandas
-> 1987 g._meta = g._meta[key]
1988 return g
File ~/mambaforge/envs/test-environment/lib/python3.9/site-packages/pandas/core/groupby/generic.py:1538, in DataFrameGroupBy.__getitem__(self, key)
1532 warnings.warn(
1533 "Indexing with multiple keys (implicitly converted to a tuple "
1534 "of keys) will be deprecated, use a list instead.",
1535 FutureWarning,
1536 stacklevel=2,
1537 )
-> 1538 return super().__getitem__(key)
File ~/mambaforge/envs/test-environment/lib/python3.9/site-packages/pandas/core/base.py:232, in SelectionMixin.__getitem__(self, key)
231 if key not in self.obj:
--> 232 raise KeyError(f"Column not found: {key}")
233 subset = self.obj[key]
KeyError: 'Column not found: compute'
The above exception was the direct cause of the following exception:
AttributeError Traceback (most recent call last)
Input In [3], in <module>
----> 1 df.groupby("name").compute()
File ~/GitHub/dask/dask/dataframe/groupby.py:2003, in DataFrameGroupBy.__getattr__(self, key)
2001 return self[key]
2002 except KeyError as e:
-> 2003 raise AttributeError(e) from e
AttributeError: 'Column not found: compute'Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
dataframeenhancementImprove existing functionality or make things work betterImprove existing functionality or make things work bettergood first issueClearly described and easy to accomplish. Good for beginners to the project.Clearly described and easy to accomplish. Good for beginners to the project.