Skip to content

DataStore: LazyGroupBy is not iterable; gb['col'] iteration drops grouping #581

@wudidapaopao

Description

@wudidapaopao

Problem

DataStore.groupby(...) returns LazyGroupBy, but two pandas-standard iteration patterns don't work:

import pandas as pd
from datastore import DataStore

pdf = pd.DataFrame({
    'date': ['2026-05-22', '2026-05-22', '2026-05-23'],
    'code': ['000001', '000002', '000001'],
    'price': [10.0, 20.0, 10.5],
})
ds = DataStore(pdf)

# 1) DataFrameGroupBy iteration
for (date, code), group in ds.groupby(['date', 'code']):
    ...
# TypeError: Expected str or list, got int
#   (misleading - __iter__ is missing, Python falls back to __getitem__(0))

# 2) SeriesGroupBy iteration  
for k, s in ds.groupby('date')['price']:
    print(k, s)
# Yields raw scalar values, not (key, sub_series) pairs - silently wrong

pandas equivalents work fine and are very common (per-group loops, plotting, custom logic). The error in case 1 is especially misleading: it hides a missing feature behind a wrong-arg-type message.

Expected

Mirror pandas semantics for LazyGroupBy (DataFrameGroupBy) and the ColumnExpr returned by gb['col'] (SeriesGroupBy): iteration, get_group(), .groups, .indices, len(), in.

Fix

PR with implementation + 44 mirror-pattern tests incoming.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions