Skip to content

Nicer handling of incorrect usage of DataFrameGroupBy #8695

@bryanwweber

Description

@bryanwweber

I was trying to debug a DataFrameGroupBy operation and naively tried to .compute() it. This raised a KeyError that the compute column was missing.

import dask

df = dask.datasets.timeseries()
df.groupby("name").compute()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
...
-> 2003     raise AttributeError(e) from e

AttributeError: 'Column not found: compute'

In hindsight, this is clearly not the right thing to do. On the other hand, as a naive Pandas-familiar user, I was trying to "compute" the Pandas GroupBy object so that I could inspect it. I think it would be nice to have a more useful error in this case. However, since dask.dataframe.DataFrameGroupBy re-raises the KeyError as an AttributeError, I'm not sure what the appropriate resolution should be 😄

Full traceback 👇
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/GitHub/dask/dask/dataframe/groupby.py:2001, in DataFrameGroupBy.__getattr__(self, key)
   2000 try:
-> 2001     return self[key]
   2002 except KeyError as e:

File ~/GitHub/dask/dask/dataframe/groupby.py:1987, in DataFrameGroupBy.__getitem__(self, key)
   1986 # error is raised from pandas
-> 1987 g._meta = g._meta[key]
   1988 return g

File ~/mambaforge/envs/test-environment/lib/python3.9/site-packages/pandas/core/groupby/generic.py:1538, in DataFrameGroupBy.__getitem__(self, key)
   1532     warnings.warn(
   1533         "Indexing with multiple keys (implicitly converted to a tuple "
   1534         "of keys) will be deprecated, use a list instead.",
   1535         FutureWarning,
   1536         stacklevel=2,
   1537     )
-> 1538 return super().__getitem__(key)

File ~/mambaforge/envs/test-environment/lib/python3.9/site-packages/pandas/core/base.py:232, in SelectionMixin.__getitem__(self, key)
    231 if key not in self.obj:
--> 232     raise KeyError(f"Column not found: {key}")
    233 subset = self.obj[key]

KeyError: 'Column not found: compute'

The above exception was the direct cause of the following exception:

AttributeError                            Traceback (most recent call last)
Input In [3], in <module>
----> 1 df.groupby("name").compute()

File ~/GitHub/dask/dask/dataframe/groupby.py:2003, in DataFrameGroupBy.__getattr__(self, key)
   2001     return self[key]
   2002 except KeyError as e:
-> 2003     raise AttributeError(e) from e

AttributeError: 'Column not found: compute'

Metadata

Metadata

Assignees

No one assigned

    Labels

    dataframeenhancementImprove existing functionality or make things work bettergood first issueClearly described and easy to accomplish. Good for beginners to the project.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions