fix(workflows): interactive loop resume uses fresh session on first iteration#1923
Conversation
When resuming a paused interactive loop gate (e.g. archon-piv-loop), iteration 2+ always failed with error_during_execution because needsFreshSession was (loop.fresh_context || i === 1) — but on resume the loop begins at startIteration >= 2, so the condition was never true and the executor blindly passed the stored gate sessionId to the SDK. After hours of human review wait that session is typically expired, which is the failure the issue reporter observed. Changes: - packages/workflows/src/dag-executor.ts: force a fresh session on the first iteration of every interactive loop resume by extending needsFreshSession with (isLoopResume && i === startIteration). Subsequent iterations in the same resume continue to thread currentSessionId from the fresh first iteration, preserving multi-turn continuity. User feedback is already carried via \$LOOP_USER_INPUT so session continuity is not required. - packages/workflows/src/dag-executor.test.ts: update the existing "interactive loop resumes from stored iteration" test to expect an undefined sessionArg (the test was enforcing the buggy behavior). - packages/workflows/src/dag-executor.test.ts: add a regression test that simulates a resumed interactive loop with a stale stored sessionId and asserts the SDK is invoked with undefined and no failure events are emitted. Coordinates with #1291 (fail loudly on SDK isError) and #1294 (orchestrator-side stale-session recovery) without reintroducing any retry loop — we simply avoid passing the stale id. Fixes #1208
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughFix for interactive loop resume crashes by forcing fresh AI session on resumed iteration instead of reusing stale session stored at gate. Executor logic updated, existing test assertion corrected to expect fresh session, new regression test added, and documentation clarified. ChangesInteractive Loop Resume Session Handling
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Comprehensive PR ReviewPR: #1923 — fix(workflows): interactive loop resume uses fresh session on first iteration SummaryThis is a surgical, correct fix for issue #1208. The one-line addition to Verdict: ✅
🟢 Low Issues (All Optional)LOW-1:
|
| Title | Priority |
|---|---|
Document interactive loop resume session freshness in loop-nodes.md |
P2 |
| Add multi-iteration resume session-threading test | P3 |
Next Steps
- Optional: Apply LOW-4 comment broadening (trivial one-liner — no block on merge)
- Optional: Apply LOW-5 docs update (one sentence in
loop-nodes.md) - Nothing blocks merge — mark ready for review when comfortable
Reviewed by Archon prp-core:prp-review-agents workflow
- Broaden comment in dag-executor.ts: remove false implication that the stale-session guard is Claude-only; fix applies to any provider with sessionResume support - Update loop-nodes.md: document that the first iteration after resuming from an interactive loop approval gate is also always fresh, so users relying on fresh_context:false across a gate boundary are not surprised
⚡ Self-Fix Report (Aggressive)Status: COMPLETE Fixes Applied (2 total)
View all fixes
Tests Added(none) Skipped (3)All three were explicitly recommended as "leave as-is" or "skip" by the review agents:
Suggested Follow-up Issues(none) Validation✅ Type check | ✅ Lint | ✅ Tests (283 passed) Self-fix by Archon · aggressive mode · fixes pushed to |
Summary
interactive: true) crash witherror_during_executionon every iteration after the first approval gate. Iteration 1 runs fine; resuming at iteration 2+ always fails.archon-piv-loop) from completing after the first human gate — the only workaround wasfresh_context: trueon every node.dag-executor.ts— added(isLoopResume && i === startIteration)to theneedsFreshSessioncondition so the first iteration of every interactive resume starts with a fresh SDK session instead of reusing a potentially stale one. Updated one test expectation and added a regression test.fresh_context: trueloops, non-interactive loops,persist_sessionloops, and PR fix(workflows): fail loudly on SDK isError results in DAG and loop nodes #1291's fail-loud throw are all untouched.UX Journey
Before
After
Architecture Diagram
Before
After
Connection inventory:
dag-executor.ts:executeLoopaiClient.sendQueryresumeSessionIdis nowundefinedon first resumed iterationdag-executor.ts:executeLooploopGateMeta.sessionIdresumeSessionId) on first resumeLabel Snapshot
risk: lowsize: XSworkflowsworkflows:executorChange Metadata
bugworkflowsLinked Issue
isError— what made this crash visible)Validation Evidence (required)
Directly relevant tests:
interactive loop resumes from stored iteration with user input— updated expectation (undefinedinstead of'loop-session-1') ✅interactive loop resume does not crash with error_during_execution on iteration 2— new regression test ✅loop iteration fails loudly when SDK returns error_during_execution— unchanged fail-loud path ✅Security Impact (required)
Compatibility / Migration
Human Verification (required)
resumeSessionId = undefined(fresh session).currentSessionIdfrom the previous iteration, preserving multi-turn continuity within a resume.isErrorpath (PR fix(workflows): fail loudly on SDK isError results in DAG and loop nodes #1291) still triggers correctly for non-stale-session failures.Side Effects / Blast Radius (required)
isLoopResume && i === startIteration).fresh_context: trueloops, non-interactive loops,persist_sessionloops, or any non-loop DAG nodes.$LOOP_USER_INPUTso session continuity is not required for context.Rollback Plan (required)
git revert c06173bderror_during_executionin workflow run events for interactive loop nodes on iteration 2+.Risks and Mitigations
$LOOP_USER_INPUTcarries the user's approval comment into the prompt; the previous iteration's output is available via$LOOP_PREV_OUTPUTif configured. Session continuity for multi-turn refinement is restored from iteration 2 onward (currentSessionId is updated after the fresh first iteration).Summary by CodeRabbit
Bug Fixes
Documentation
Tests