Skip to content

Generalize post-timeout compaction completion reconciliation across session state #45505

@jackal092927

Description

@jackal092927

Summary

OpenClaw now has a narrow fix for one specific state mismatch: compaction can succeed after the runner stops waiting, and sessions.json.compactionCount needs to be reconciled afterward.

That solves a real bug, but it likely points to a broader design gap:

  • compaction completion can become known after run-finalization logic has already moved on
  • multiple user-visible surfaces depend on run-finalization snapshots instead of post-compaction truth
  • timeout handling and state convergence are still only loosely coupled

This follow-up issue is about making that completion/reconciliation model more general.

Motivation

Current state propagation is fragmented:

  • transcript/session history persistence has one source of truth
  • session-store counters and cached token snapshots are updated elsewhere
  • channel/session recovery logic has its own timeout and abort semantics

That means timeout-late-success cases can leave different parts of the system disagreeing about what actually happened.

What This Is Not

This is not a duplicate of narrower or adjacent issues such as:

  • timeout/channel races
  • session freeze / lane stall bugs
  • /compact UX failures
  • subagent-specific stats aggregation bugs

Those are related, but this follow-up is specifically about a more unified model for post-timeout compaction completion semantics and state convergence.

Proposed Direction

Consider introducing a more explicit compaction completion reconciliation path that becomes the authoritative place to converge state after compaction finishes, even if the enclosing run already timed out or stopped waiting.

Potential scope:

  • session-store counters (compactionCount)
  • token/context snapshots used by /status and UI surfaces
  • optional internal lifecycle/completion event for downstream consumers
  • best-effort state repair after timeout-late-success outcomes

Desirable Properties

Any generalized solution should preserve:

  • current timeout bounds
  • non-blocking conversation flow
  • low resource usage
  • monotonic/idempotent updates

Possible Shape

A possible design direction:

  1. Make compaction success/completion a first-class reconciliation event.
  2. Route all post-compaction state convergence through that event.
  3. Use monotonic updates (max(...) / compare-and-set style semantics) so late or duplicated signals do not corrupt counters.
  4. Keep transcript scans off the hot path except as targeted fallback/recovery.

Why A Separate Follow-Up

The current merged fix should stay narrow and low-risk.

A broader design would touch:

  • runner completion semantics
  • session-store persistence strategy
  • /status and related state readers
  • potentially channel/session recovery behavior

That deserves separate discussion and review.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Normal backlog priority with limited blast radius.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions