fix(workflows): make resume explicit via prepareResumedRun / hydrateResumableRun (closes #1392)#1646
Conversation
…esumableRun (closes #1392) Previously `executeWorkflow` called `findResumableRun(workflow.name, cwd)` unconditionally and silently auto-resumed the most recent failed run for the same `(workflow_name, cwd)` pair. The `--resume` CLI flag was documented as opt-in but had no causal effect on this path — it only steered worktree selection. Net effect: cached node outputs from a prior failed run (e.g. an `extract-pr-number` returning "1634") bled into the next invocation of the same workflow at the same path. This commit moves the resume decision out of the executor and into the caller. The executor takes a `preCreatedRun` and optional `priorCompletedNodes` via a new options bag; it never queries the store for a prior run. Two helpers in `@archon/workflows/executor` cover the two real lookup shapes: - `hydrateResumableRun(deps, candidate)` — caller already has the run row (CLI `--resume` post worktree-path resolution; orchestrator foreground-resume via `findResumableRunByParentConversation`). - `prepareResumedRun(deps, workflow, cwd)` — canonical `(workflow, cwd)` lookup-and-hydrate. Both throw on DB errors instead of the previous "warn and degrade to fresh" silent fallback, matching the Fail Fast principle. `executeWorkflow`'s trailing positional args (`codebaseId`, `issueContext`, `isolationContext`, `parentConversationId`, `preCreatedRun`, plus the new `priorCompletedNodes`) consolidate into a single `ExecuteWorkflowOptions` bag — fixing the `undefined, undefined, undefined, undefined, true` smell at every caller. 7 required positionals + 1 opts; future additions go to opts. Orphan-cleanup code in the executor deletes outright: with caller-side resume there is no separate `preCreatedRun` to orphan when resume takes over. Supersedes #1396 (which gated the same internal lookup behind a flag without addressing the positional explosion or the conceptual responsibility leak). Closes #1392
📝 WalkthroughWalkthroughThis PR refactors workflow resume from implicit auto-resume to explicit opt-in. ChangesExplicit Workflow Resume Refactoring
Sequence Diagram(s)sequenceDiagram
participant CLI
participant OrchestratorAgent
participant hydrateResumableRun
participant Executor
participant DAG
CLI->>OrchestratorAgent: /workflow run <name> [--resume?]
OrchestratorAgent->>hydrateResumableRun: find & hydrate candidate (when resuming)
hydrateResumableRun-->>OrchestratorAgent: {preCreatedRun, priorCompletedNodes} | null
OrchestratorAgent->>Executor: executeWorkflow(opts{codebaseId, parentConversationId, preCreatedRun?, priorCompletedNodes?})
Executor->>DAG: run dag, skipping priorCompletedNodes if provided
DAG-->>Executor: result
Executor-->>OrchestratorAgent: completion/status
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
packages/workflows/src/executor.ts (1)
329-401:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftBring
preCreatedRununder failure handling before the first async setup.
preCreatedRunis not assigned toworkflowRununtil Line 401, but the function awaits config / env / provider work from Line 337 onward and only enters its catch/finally block at Line 567. Any throw in that window exits without finalizing the caller-created row. That leaves background-dispatch rows stuckpending, and resumed rows are worse becausehydrateResumableRun()has already flipped them torunning. Please either assignworkflowRun = preCreatedRunbefore the first await and expand the top-level failure handler to cover the whole function, or defer theresumeWorkflowRun()state change until after this setup succeeds.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/workflows/src/executor.ts` around lines 329 - 401, Assign the caller-provided preCreatedRun to workflowRun immediately after destructuring opts (i.e., set workflowRun = preCreatedRun before any awaits) so the created DB row is present if setup throws, and ensure any call that flips persisted state for resumed runs (hydrateResumableRun() / resumeWorkflowRun()) is moved to run only after the async setup (config/env/provider/model resolution) completes successfully; alternatively, expand the existing top-level try/catch/finally to start before the first await so failure handling covers the entire function. Reference symbols: preCreatedRun, workflowRun, hydrateResumableRun(), resumeWorkflowRun(), and the config/provider resolution code block.packages/core/src/orchestrator/orchestrator-agent.ts (1)
275-312:⚠️ Potential issue | 🟠 Major | ⚡ Quick winThis shared dispatch path still reintroduces implicit resume on web.
dispatchOrchestratorWorkflow()is used by plain/workflow runand AI/invoke-workflow, so doingfindResumableRunByParentConversation()here means a later run in the same web conversation can still silently reusepreCreatedRun/priorCompletedNodes. That breaks the PR’s fresh-by-default goal, and it also makes approval flows rely on a fuzzy re-lookup instead of the exact paused run the caller already has.Please make resume explicit at this boundary too: pass hydrated resume state into
dispatchOrchestratorWorkflow()only from approval/resume callers, and keep ordinary run paths fresh.Possible shape
async function dispatchOrchestratorWorkflow( platform: IPlatformAdapter, conversationId: string, conversation: Conversation, codebase: Codebase, workflow: WorkflowDefinition, userMessage: string, - isolationHints?: HandleMessageContext['isolationHints'] + isolationHints?: HandleMessageContext['isolationHints'], + resume?: { + cwd: string; + preCreatedRun: WorkflowRun; + priorCompletedNodes: Map<string, string>; + } ): Promise<void> { // ... if (platform.getPlatformType() === 'web') { - const resumableRun = await workflowDb.findResumableRunByParentConversation( - workflow.name, - conversation.id - ); - if (resumableRun?.working_path) { - const deps = createWorkflowDeps(); - const prepared = await hydrateResumableRun(deps, resumableRun); - if (!prepared) { - throw new Error(...); - } + if (resume) { await executeWorkflow( - deps, + createWorkflowDeps(), platform, conversationId, - resumableRun.working_path, + resume.cwd, workflow, userMessage, conversation.id, { codebaseId: codebase.id, parentConversationId: conversation.id, - preCreatedRun: prepared.run, - priorCompletedNodes: prepared.priorCompletedNodes, + preCreatedRun: resume.preCreatedRun, + priorCompletedNodes: resume.priorCompletedNodes, } ); } else if (workflow.interactive) {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/core/src/orchestrator/orchestrator-agent.ts` around lines 275 - 312, The shared dispatch path currently calls findResumableRunByParentConversation and hydrates a resumable run here, which causes implicit resume for ordinary runs; change this so dispatchOrchestratorWorkflow never performs implicit resume: remove the findResumableRunByParentConversation/hydrateResumableRun logic from this shared path and instead only accept a hydrated resume payload (e.g., a prepared run object containing preCreatedRun and priorCompletedNodes) when callers explicitly intend to resume (approval/resume entrypoints); update callers that perform explicit resume to call hydrateResumableRun and then pass the prepared resume state into dispatchOrchestratorWorkflow, leaving plain /workflow run and /invoke-workflow callers to invoke dispatchOrchestratorWorkflow without any resume state so they always start fresh.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/cli/src/commands/workflow.ts`:
- Around line 452-455: The resumable-run lookup uses the raw CLI token
(workflowName) instead of the resolved canonical name, so change the call to
workflowDb.findResumableRun to pass the resolved name (workflow.name) returned
by resolveWorkflowName(); update the code around where resumable is assigned
(the call to findResumableRun and its surrounding logic that throws when
!resumable) to use workflow.name so stored rows keyed by the canonical workflow
name are matched correctly.
In `@packages/workflows/src/executor.ts`:
- Around line 273-276: The fast-path check only treats metadata.approval.type
=== 'interactive_loop' as resumable (hasInteractiveLoopState), causing valid
approval/gate nodes with other approval types to be rejected when
priorCompletedNodes.size === 0; update the predicate used in the if so it
detects any resumable approval/gate state rather than hard-coding
'interactive_loop' — e.g., replace/extend hasInteractiveLoopState with a check
that candidate.metadata?.approval exists and its state/type indicates a
resumable approval (or use a helper like
isResumableApproval(candidate.metadata.approval)) and then use that new
predicate in the priorCompletedNodes.size === 0 conditional to allow resuming
for all such approval nodes.
---
Outside diff comments:
In `@packages/core/src/orchestrator/orchestrator-agent.ts`:
- Around line 275-312: The shared dispatch path currently calls
findResumableRunByParentConversation and hydrates a resumable run here, which
causes implicit resume for ordinary runs; change this so
dispatchOrchestratorWorkflow never performs implicit resume: remove the
findResumableRunByParentConversation/hydrateResumableRun logic from this shared
path and instead only accept a hydrated resume payload (e.g., a prepared run
object containing preCreatedRun and priorCompletedNodes) when callers explicitly
intend to resume (approval/resume entrypoints); update callers that perform
explicit resume to call hydrateResumableRun and then pass the prepared resume
state into dispatchOrchestratorWorkflow, leaving plain /workflow run and
/invoke-workflow callers to invoke dispatchOrchestratorWorkflow without any
resume state so they always start fresh.
In `@packages/workflows/src/executor.ts`:
- Around line 329-401: Assign the caller-provided preCreatedRun to workflowRun
immediately after destructuring opts (i.e., set workflowRun = preCreatedRun
before any awaits) so the created DB row is present if setup throws, and ensure
any call that flips persisted state for resumed runs (hydrateResumableRun() /
resumeWorkflowRun()) is moved to run only after the async setup
(config/env/provider/model resolution) completes successfully; alternatively,
expand the existing top-level try/catch/finally to start before the first await
so failure handling covers the entire function. Reference symbols:
preCreatedRun, workflowRun, hydrateResumableRun(), resumeWorkflowRun(), and the
config/provider resolution code block.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2d5beade-938b-4a26-8ab3-6ca1ea869fe9
📒 Files selected for processing (12)
CHANGELOG.mdCLAUDE.mdpackages/cli/src/commands/workflow.tspackages/core/src/orchestrator/orchestrator-agent.test.tspackages/core/src/orchestrator/orchestrator-agent.tspackages/core/src/orchestrator/orchestrator.test.tspackages/core/src/orchestrator/orchestrator.tspackages/docs-web/src/content/docs/guides/authoring-workflows.mdpackages/docs-web/src/content/docs/reference/api.mdpackages/workflows/src/executor-preamble.test.tspackages/workflows/src/executor.test.tspackages/workflows/src/executor.ts
| resumable = await workflowDb.findResumableRun(workflowName, cwd); | ||
|
|
||
| if (!resumable) { | ||
| throw new Error(`No resumable run found for workflow '${workflowName}' at path '${cwd}'.`); |
There was a problem hiding this comment.
Use the resolved workflow name for the resumable-run lookup.
Line 452 queries with the raw CLI token instead of workflow.name. That means fallback matches from resolveWorkflowName() work for a fresh run and then mysteriously fail for --resume because the stored row is keyed by the canonical workflow name.
Suggested fix
- resumable = await workflowDb.findResumableRun(workflowName, cwd);
+ resumable = await workflowDb.findResumableRun(workflow.name, cwd);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/cli/src/commands/workflow.ts` around lines 452 - 455, The
resumable-run lookup uses the raw CLI token (workflowName) instead of the
resolved canonical name, so change the call to workflowDb.findResumableRun to
pass the resolved name (workflow.name) returned by resolveWorkflowName(); update
the code around where resumable is assigned (the call to findResumableRun and
its surrounding logic that throws when !resumable) to use workflow.name so
stored rows keyed by the canonical workflow name are matched correctly.
PR Review Summary — Multi-AgentSeven specialized agents reviewed this PR: Critical Issues (1)
Important Issues (5)
Suggestions (5)
Strengths
VerdictNEEDS FIXES — one blocker (broken Web UI button) plus YAGNI + invariant gap. Recommended Actions
|
Server resume endpoint: wire POST /api/workflows/runs/:runId/resume to dispatch via the orchestrator (parent web conversation) instead of returning a stale "re-run to auto-resume" message that no longer matches reality. CLI-created runs and non-web parents return a clear 400 pointing at `archon workflow resume <id>`. Executor type design: drop the YAGNI `prepareResumedRun` helper (no production callers), tighten `ExecuteWorkflowOptions` with a discriminated union so `priorCompletedNodes` without `preCreatedRun` is a type error, and rename `hydrateResumableRun`'s return key from `run` to `preCreatedRun` so callers can spread directly. Orchestrator foreground-resume: when hydration returns null (prior run had nothing worth resuming) surface a user-visible notice and fall through to a fresh run on the same worktree, instead of throwing a >100-char error that `classifyAndFormatError` would swallow into the generic `/reset` advice. CLI resume: wrap `hydrateResumableRun` in try/catch with context so DB errors don't surface as raw stack traces. Docs: update three stale locations describing resume as automatic on re-invocation (book/dag-workflows.md, guides/authoring-workflows.md, guides/approval-nodes.md). Tests + comments: rewrite resume endpoint tests to cover the four new branches (404, 400 no-parent, 400 non-web, 200 web dispatch); add orchestrator test for the new hydrate→null fall-through; fix the off-by-one arg-index comment in executor.test.ts; drop issue-number references and position-leaky comments per comment-analyzer feedback.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
packages/workflows/src/executor.ts (1)
272-275:⚠️ Potential issue | 🟠 Major | ⚡ Quick winBroaden resumable approval detection beyond
interactive_loop.At Line 272-Line 275, zero-completed runs resume only when
approval.type === 'interactive_loop'. This can reject valid paused approval-gate resumes with other approval types (e.g., first-node approval), causinghydrateResumableRunto returnnullincorrectly.Suggested fix
- const hasInteractiveLoopState = - candidate.metadata?.approval !== undefined && - (candidate.metadata.approval as Record<string, unknown>).type === 'interactive_loop'; - if (priorCompletedNodes.size === 0 && !hasInteractiveLoopState) { + const approvalMeta = candidate.metadata?.approval as Record<string, unknown> | undefined; + const hasResumableApprovalState = + approvalMeta !== undefined && + typeof approvalMeta === 'object' && + typeof approvalMeta.nodeId === 'string'; + if (priorCompletedNodes.size === 0 && !hasResumableApprovalState) {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/workflows/src/executor.ts` around lines 272 - 275, The current logic only treats runs as resumable when candidate.metadata?.approval.type === 'interactive_loop', causing hydrateResumableRun to return null for other paused approval types; change the detection to consider any approval metadata instead. Replace hasInteractiveLoopState with a broader boolean (e.g., hasApprovalState) that checks candidate.metadata?.approval !== undefined (or truthy), update the if condition that uses priorCompletedNodes so it uses hasApprovalState, and ensure any downstream assumptions about interactive_loop-specific fields are guarded; run/update tests for hydrateResumableRun and approval-resume scenarios after the change.
🧹 Nitpick comments (1)
packages/docs-web/src/content/docs/guides/approval-nodes.md (1)
237-241: 💤 Low valueConsider abstracting implementation details in Design Notes.
The Design Notes section mentions the specific function
hydrateResumableRun, which couples the documentation to the current implementation. If this function is renamed or refactored, the docs will become stale.Consider rephrasing to describe the mechanism without naming internal functions, e.g., "the orchestrator's explicit resume path" without the parenthetical function name.
That said, since this is a Design Notes section clearly marked for advanced users/maintainers, including implementation details may be intentional and acceptable.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/docs-web/src/content/docs/guides/approval-nodes.md` around lines 237 - 241, The Design Notes currently mention the internal function hydrateResumableRun, which couples docs to implementation; update the paragraph that reads "the orchestrator's explicit resume path (via `hydrateResumableRun`)" to either remove the parenthetical function name or rephrase to a non-implementation-specific description like "the orchestrator's explicit resume path" and optionally note that this refers to the orchestrator's resume mechanism (avoid naming hydrateResumableRun), while keeping the clarification about metadata.approval_response distinguishing resumed vs genuinely-failed runs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@packages/workflows/src/executor.ts`:
- Around line 272-275: The current logic only treats runs as resumable when
candidate.metadata?.approval.type === 'interactive_loop', causing
hydrateResumableRun to return null for other paused approval types; change the
detection to consider any approval metadata instead. Replace
hasInteractiveLoopState with a broader boolean (e.g., hasApprovalState) that
checks candidate.metadata?.approval !== undefined (or truthy), update the if
condition that uses priorCompletedNodes so it uses hasApprovalState, and ensure
any downstream assumptions about interactive_loop-specific fields are guarded;
run/update tests for hydrateResumableRun and approval-resume scenarios after the
change.
---
Nitpick comments:
In `@packages/docs-web/src/content/docs/guides/approval-nodes.md`:
- Around line 237-241: The Design Notes currently mention the internal function
hydrateResumableRun, which couples docs to implementation; update the paragraph
that reads "the orchestrator's explicit resume path (via `hydrateResumableRun`)"
to either remove the parenthetical function name or rephrase to a
non-implementation-specific description like "the orchestrator's explicit resume
path" and optionally note that this refers to the orchestrator's resume
mechanism (avoid naming hydrateResumableRun), while keeping the clarification
about metadata.approval_response distinguishing resumed vs genuinely-failed
runs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9b087391-6c8e-4970-a476-08f54b1f7e4f
📒 Files selected for processing (13)
packages/cli/src/commands/workflow.test.tspackages/cli/src/commands/workflow.tspackages/core/src/orchestrator/orchestrator-agent.test.tspackages/core/src/orchestrator/orchestrator-agent.tspackages/core/src/orchestrator/orchestrator.test.tspackages/docs-web/src/content/docs/book/dag-workflows.mdpackages/docs-web/src/content/docs/guides/approval-nodes.mdpackages/docs-web/src/content/docs/guides/authoring-workflows.mdpackages/server/src/routes/api.tspackages/server/src/routes/api.workflow-runs.test.tspackages/workflows/src/executor-preamble.test.tspackages/workflows/src/executor.test.tspackages/workflows/src/executor.ts
✅ Files skipped from review due to trivial changes (1)
- packages/docs-web/src/content/docs/guides/authoring-workflows.md
🚧 Files skipped from review as they are similar to previous changes (4)
- packages/core/src/orchestrator/orchestrator.test.ts
- packages/cli/src/commands/workflow.ts
- packages/core/src/orchestrator/orchestrator-agent.test.ts
- packages/workflows/src/executor-preamble.test.ts
…esumableRun (closes coleam00#1392) (coleam00#1646) * fix(workflows): make resume explicit via prepareResumedRun / hydrateResumableRun (closes coleam00#1392) Previously `executeWorkflow` called `findResumableRun(workflow.name, cwd)` unconditionally and silently auto-resumed the most recent failed run for the same `(workflow_name, cwd)` pair. The `--resume` CLI flag was documented as opt-in but had no causal effect on this path — it only steered worktree selection. Net effect: cached node outputs from a prior failed run (e.g. an `extract-pr-number` returning "1634") bled into the next invocation of the same workflow at the same path. This commit moves the resume decision out of the executor and into the caller. The executor takes a `preCreatedRun` and optional `priorCompletedNodes` via a new options bag; it never queries the store for a prior run. Two helpers in `@archon/workflows/executor` cover the two real lookup shapes: - `hydrateResumableRun(deps, candidate)` — caller already has the run row (CLI `--resume` post worktree-path resolution; orchestrator foreground-resume via `findResumableRunByParentConversation`). - `prepareResumedRun(deps, workflow, cwd)` — canonical `(workflow, cwd)` lookup-and-hydrate. Both throw on DB errors instead of the previous "warn and degrade to fresh" silent fallback, matching the Fail Fast principle. `executeWorkflow`'s trailing positional args (`codebaseId`, `issueContext`, `isolationContext`, `parentConversationId`, `preCreatedRun`, plus the new `priorCompletedNodes`) consolidate into a single `ExecuteWorkflowOptions` bag — fixing the `undefined, undefined, undefined, undefined, true` smell at every caller. 7 required positionals + 1 opts; future additions go to opts. Orphan-cleanup code in the executor deletes outright: with caller-side resume there is no separate `preCreatedRun` to orphan when resume takes over. Supersedes coleam00#1396 (which gated the same internal lookup behind a flag without addressing the positional explosion or the conceptual responsibility leak). Closes coleam00#1392 * fix(workflows): address review feedback on explicit-resume PR Server resume endpoint: wire POST /api/workflows/runs/:runId/resume to dispatch via the orchestrator (parent web conversation) instead of returning a stale "re-run to auto-resume" message that no longer matches reality. CLI-created runs and non-web parents return a clear 400 pointing at `archon workflow resume <id>`. Executor type design: drop the YAGNI `prepareResumedRun` helper (no production callers), tighten `ExecuteWorkflowOptions` with a discriminated union so `priorCompletedNodes` without `preCreatedRun` is a type error, and rename `hydrateResumableRun`'s return key from `run` to `preCreatedRun` so callers can spread directly. Orchestrator foreground-resume: when hydration returns null (prior run had nothing worth resuming) surface a user-visible notice and fall through to a fresh run on the same worktree, instead of throwing a >100-char error that `classifyAndFormatError` would swallow into the generic `/reset` advice. CLI resume: wrap `hydrateResumableRun` in try/catch with context so DB errors don't surface as raw stack traces. Docs: update three stale locations describing resume as automatic on re-invocation (book/dag-workflows.md, guides/authoring-workflows.md, guides/approval-nodes.md). Tests + comments: rewrite resume endpoint tests to cover the four new branches (404, 400 no-parent, 400 non-web, 200 web dispatch); add orchestrator test for the new hydrate→null fall-through; fix the off-by-one arg-index comment in executor.test.ts; drop issue-number references and position-leaky comments per comment-analyzer feedback.
Summary
executeWorkflowsilently auto-resumed the most recent failed run for the same(workflow_name, cwd)pair. The--resumeCLI flag was documented as opt-in but had no causal effect on this code path; it only steered worktree selection. We reproduced the live failure mode in PR Fix Issue #972 | Workflow Composition #1282's review queue —extract-pr-number's cached output "1634" leaked into every subsequent invocation ofarchon workflow run maintainer-review-pr <N>for the samecwd.executeWorkflowand into the caller. Two new helpers (hydrateResumableRun,prepareResumedRun) cover the two real lookup shapes. Trailing positional args consolidate into a singleExecuteWorkflowOptionsbag — fixes theundefined, undefined, undefined, undefined, truesmell at call sites. Orphan-cleanup code deletes outright.hydrateResumableRundirectly with the run row it already found). Web UI resume button.archon workflow resume <id>semantics. DB schema. Platform adapters.UX Journey
Before
After
Architecture Diagram
Before
After
Connection inventory:
executor.tsfindResumableRunexecutor.tsprepareResumedRun/hydrateResumableRunexecutor.tsExecuteWorkflowOptionscli/workflow.tshydrateResumableRunorchestrator.tsexecuteWorkfloworchestrator-agent.tsforeground-resumehydrateResumableRunorchestrator-agent.tsinteractiveexecuteWorkflowexecutor.ts(~120 lines)preCreatedRunto orphan now that resume flows through optsLabel Snapshot
risk: low— closesarchon workflow runauto-resumes failed runs without--resume— docs say opt-in #1392; behaviour now matches documented--resumesemantics; smaller diff than fix(workflows): make auto-resume opt-in via --resume flag #1396.size: Mworkflows,cli,coreworkflows:executor,cli:workflow,core:orchestratorChange Metadata
bugworkflowsLinked Issue
Closes #1392. Supersedes #1396.
Validation Evidence (required)
bun run validate # ✅ exit 0check:bundled,check:bundled-skill,type-check,lint --max-warnings 0,format:check, all per-package test batches — green.prepareResumedRunsuite (nullwhen no resumable, hydrates candidate, propagates DB errors)hydrateResumableRunsuite (hydrate, return null on zero+no-loop, paused-loop hydration, DB error propagation on bothgetCompletedDagNodeOutputsandresumeWorkflowRun)archon workflow runauto-resumes failed runs without--resume— docs say opt-in #1392 cross-invocation leak: asserts the executor does NOT callfindResumableRunon its ownexecutor.test.ts"uses caller-supplied preCreatedRun + priorCompletedNodes" — asserts opts flow into dag-executor at the right slotSecurity Impact (required)
NoNoNoNoCompatibility / Migration
executeWorkflowcallers (API), yes for users of documented CLI/UI surfaces.executeWorkflow. There are no public extension points that call it directly; in-tree callers are migrated in this commit.archon workflow run <name>always starts fresh (matches docs); resume requires--resume,archon workflow resume <id>, the web UI button, or approve/reject in chat (unchanged).NoNoHuman Verification (required)
What was personally validated beyond CI:
archon workflow run maintainer-review-pr <N>for different<N>) — confirms each invocation now creates a fresh run, nodag.node_skipped_prior_successleak.(workflow, cwd)— fresh run created (no implicit resume). This is the bug-fix.--resumewith a resumable prior run — resumes and skips completed nodes.--resumewith no resumable prior run — clear error, no fresh run.hydrateResumableRunsucceeds and resume continues.Side Effects / Blast Radius (required)
packages/workflows/src/executor.ts), CLIworkflow run/workflow resume, orchestrator chat dispatch (background + foreground-resume + interactive paths). No DB schema or adapter changes.--resume, approve/reject, foreground-resume, and concurrent-run-guard are all updated and pass. A new regression test pins down the cross-invocation leak fromarchon workflow runauto-resumes failed runs without--resume— docs say opt-in #1392.Rollback Plan (required)
--resumedoes not resume" (would indicate a positional/opts wiring bug). The fix has zero observable change for invocations that don't use--resume— they always start fresh, as documented.Risks and Mitigations
--resumeflag and/workflow resume <id>command are the explicit paths. CHANGELOG entry calls this out.@archon/workflows/executorfor callers that mock it.hydrateResumableRunexport is part of the module signature, so consumers needing tighter test-mocks now know the surface.Summary by CodeRabbit
Bug Fixes
--resume,archon workflow resume <id>, or Web UI resume. Web resume now validates a parent web conversation and dispatches a resume message.Documentation
Tests