Skip to content

bug: RewindBoth silently fails after auto-compact — code rolled back but conversation is not #3598

@expfukck

Description

@expfukck

Problem

After modifying many files in a conversation, "Rewind Both" (conversation + code) silently fails: code gets rolled back but conversation does NOT. The user sees a success message "rewound conversation to turn N" but the conversation is unchanged.

Root Cause: 4 compounding bugs

Bug 1 (Primary): Auto-compact invalidates cpBound but doesn't clear it

File: internal/control/controller.go, line 931–936
File: internal/agent/compact.go, line 73 (maybeCompact)
File: internal/agent/agent.go, line 530 (auto-compact call site)

The cpBound map records len(Session.Messages) at each turn's start — the truncation index for conversation rewind. When many files are modified, large tool outputs fill the context window, triggering auto-compact at 80% threshold. session.Replace() shrinks the message log (e.g. from 150 messages to 20), but cpBound retains stale indices.

Compact path Clears cpBound? Location
summarizeAt() (SummarizeFrom/UpTo) ✅ Yes controller.go:1308
Controller.Compact() (manual /compact) No controller.go:931–936
Agent.maybeCompact() (auto-compact in loop) No agent.go:530

Failure sequence:

  1. Turn 5 records cpBound[5] = 150 (150 messages at that point)
  2. Auto-compact fires, session.Replace() shrinks messages to 20
  3. User clicks "Rewind Both" on turn 5
  4. hasBound = true (stale entry exists), boundary = 150
  5. Code rollback succeeds
  6. boundary (150) <= len(s.Messages) (20)false → truncation silently skipped
  7. Success message "rewound conversation to turn 5" is still emitted (line 1047)

Bug 2: Non-atomic RewindBoth — code rolled back even when conversation fails

File: internal/control/controller.go, lines 1020–1031

if scope == RewindCode || scope == RewindBoth {
    written, deleted, err := c.cp.RestoreCode(turn)  // ← executes FIRST
    // ...
}
if scope == RewindConversation || scope == RewindBoth {
    if !hasBound {
        return c.rewindFail(...)  // ← TOO LATE: code already rolled back!
    }
}

Code restore runs unconditionally before the conversation boundary check. If conversation rewind fails, the workspace is left inconsistent: files rolled back, conversation unchanged.

Bug 3: RestoreCode is non-atomic — partial rollback on error

File: internal/checkpoint/checkpoint.go, lines 254–285

RestoreCode processes files sequentially and continues on error. If file #50 of 100 fails, files 1–49 are already restored. The caller treats any error as total failure and returns without touching conversation — leaving a partially rolled-back workspace.

Bug 4: Silent no-op when boundary exceeds message count

File: internal/control/controller.go, lines 1033–1048

if boundary <= len(s.Messages) {
    s.Messages = s.Messages[:boundary]
    // ...
}
// ← success message emitted UNCONDITIONALLY:
c.sink.Emit(event.Event{..., Text: fmt.Sprintf("rewound conversation to turn %d", turn)})

When boundary > len(s.Messages) (the exact post-compact scenario), truncation is silently skipped but the success message is emitted. The user is misled into thinking the conversation was rewound.

Reproduction

  1. Start a conversation and modify 10+ files across several turns
  2. Continue the conversation until auto-compact triggers (context window fills up)
  3. Try "Rewind Both" on an early turn
  4. Observe: files are rolled back, conversation is NOT, success message is shown

Suggested Fixes

Bug 1 (quick fix): Clear cpBound after compact, same as summarizeAt does:

// In Controller.Compact():
func (c *Controller) Compact(ctx context.Context, instructions string) error {
    if c.executor == nil { return nil }
    err := c.executor.CompactNow(ctx, instructions)
    c.mu.Lock()
    c.cpBound = map[int]int{}  // ← add this
    c.mu.Unlock()
    return err
}

For auto-compact, add a post-compact callback from Agent to Controller so the controller knows when auto-compact fires.

Bug 2: For RewindBoth, check hasBound && boundary <= len(s.Messages) BEFORE executing code restore.

Bug 3: Make RestoreCode collect all errors and attempt all files, then return a combined error with the list of failures.

Bug 4: Emit a warning instead of success when boundary > len(s.Messages):

if boundary <= len(s.Messages) {
    s.Messages = s.Messages[:boundary]
    // ... success path ...
} else {
    c.sink.Emit(event.Event{Kind: event.Notice, Level: event.LevelWarn,
        Text: fmt.Sprintf("conversation boundary for turn %d is stale (compact may have run); conversation not truncated", turn)})
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    agentCore agent loop (internal/agent, internal/control)data-lossData loss (sessions, config, history)

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions