Priority Level
Low
Task Summary
Two related cleanup gaps in the async scheduler's row-group lifecycle, found during #429 review. Neither is a correctness bug - both are bounded and low-risk - but they're structural patterns that will compound as the scheduler evolves.
Technical Details & Implementation Plan
1. RowGroupBufferManager - add a "discard without write" path
checkpoint_row_group() is the only method that frees _buffers[rg_id], _dropped[rg_id], and _row_group_sizes[rg_id], coupling memory cleanup with parquet I/O. Any failure path that skips the write (e.g. on_before_checkpoint raising) leaks the buffer until _build_async returns.
Fix: add free_row_group(rg_id) that does just the del/pop cleanup. checkpoint_row_group calls it after writing. The dropped branch in _checkpoint_completed_row_groups calls it directly.
2. Fold _seeds_dispatched_rgs / _pre_batch_done_rgs into _RowGroupState
These per-RG lifecycle markers are standalone sets that require explicit .discard() on completion - the same class of bug that already hit _admitted_rg_ids. Moving them into _RowGroupState as fields means del self._rg_states[rg_id] handles all cleanup automatically.
Dependencies
Follows #429
Priority Level
Low
Task Summary
Two related cleanup gaps in the async scheduler's row-group lifecycle, found during #429 review. Neither is a correctness bug - both are bounded and low-risk - but they're structural patterns that will compound as the scheduler evolves.
Technical Details & Implementation Plan
1.
RowGroupBufferManager- add a "discard without write" pathcheckpoint_row_group()is the only method that frees_buffers[rg_id],_dropped[rg_id], and_row_group_sizes[rg_id], coupling memory cleanup with parquet I/O. Any failure path that skips the write (e.g.on_before_checkpointraising) leaks the buffer until_build_asyncreturns.Fix: add
free_row_group(rg_id)that does just thedel/popcleanup.checkpoint_row_groupcalls it after writing. Thedroppedbranch in_checkpoint_completed_row_groupscalls it directly.2. Fold
_seeds_dispatched_rgs/_pre_batch_done_rgsinto_RowGroupStateThese per-RG lifecycle markers are standalone sets that require explicit
.discard()on completion - the same class of bug that already hit_admitted_rg_ids. Moving them into_RowGroupStateas fields meansdel self._rg_states[rg_id]handles all cleanup automatically.Dependencies
Follows #429