Skip to content

fix: interactive loop resume crashes with error_during_execution (stale session) #1208

@PhucMPham

Description

@PhucMPham

Bug

Interactive loop nodes (e.g., archon-piv-loop) crash with error_during_execution every time a user approves a paused gate (iteration 2+). Iteration 1 works fine.

Root Cause

In packages/workflows/src/dag-executor.ts around line 1795:

const needsFreshSession = loop.fresh_context || i === 1;
const resumeSessionId = needsFreshSession ? undefined : currentSessionId;

When resuming from a paused interactive loop gate:

  • startIteration = 2 (from loopGateMeta.iteration + 1)
  • i === 2, not 1, so needsFreshSession = false
  • currentSessionId is set from loopGateMeta.sessionId (the session from iteration 1)
  • The Claude SDK tries to resume a session that has been idle for minutes/hours while waiting for the human
  • That session is expired/invalid → error_during_execution

Reproduction

  1. Start archon-piv-loop workflow via Slack or Web UI
  2. Explore node runs iteration 1 successfully, asks questions, pauses at gate
  3. User approves with /workflow approve <run-id> <feedback>
  4. Iteration 2 starts, tries to resume stale session, crashes in ~5 seconds

Logs

{"level":20,"module":"provider.claude","sessionId":"e9688e1e-...","msg":"resuming_session"}
{"level":50,"module":"provider.claude","sessionId":"5864b49f-...","errorSubtype":"error_during_execution","msg":"claude.result_is_error"}

The workflow log shows iteration 1 takes ~6.7 minutes (normal), but iterations 2 and 3 crash in ~5-7 seconds each.

Proposed Fix

For interactive loop resume, always start a fresh Claude SDK session since the previous session may have expired during the human wait:

const needsFreshSession = loop.fresh_context || i === 1 || (isLoopResume && i === startIteration);

The user's input is already passed via $LOOP_USER_INPUT in the prompt, so session continuity isn't needed.

Test update needed in dag-executor.test.ts — the test "interactive loop resumes from stored iteration with user input" currently expects session resume; should expect undefined (fresh session).

Environment

  • Archon running in Docker on VPS
  • Auth via CLAUDE_CODE_OAUTH_TOKEN
  • Slack adapter (batch streaming mode)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High priority - Address soon, next in queuearea: workflowsWorkflow enginebugSomething is brokeneffort/lowSingle file or function, one responsibility, isolated change

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions