Gate stale workflow resumes by Wirasm · Pull Request #1935 · coleam00/Archon

Wirasm · 2026-06-09T10:49:23Z

Summary

Describe this PR in 2-5 bullets:

Problem: Fresh /workflow run <name> ... commands could silently resume a prior failed run for the same workflow/conversation/codebase and execute the failed run's stale user_message instead of the new request.
Why it matters: This caused silent intent loss in a P1 workflow path, where a user-visible fresh command could dispatch old work with no confirmation and no clean abandon path for failed runs.
What changed: Added explicit dispatch intent for forced fresh runs and explicit resumes, gated failed resume candidates behind a 3-option prompt before hydration, preserved paused approval auto-resume, parsed /workflow run --force anywhere in args, and allowed failed runs to be abandoned to cancelled.
What did not change (scope boundary): The existing findResumableRunByParentConversation lookup, hydrateResumableRun resume model, and WorkflowNotResumableError CAS guards remain the owner of the failed/paused -> running transition.

UX Journey

Before

User                         Archon                             Workflow engine
----                         ------                             ---------------
/workflow run fix A ----->   creates/dispatches workflow ---->  run fails

/workflow run fix B ----->   finds prior failed run
                             hydrates/resumes old row ------->  executes with old user_message A
User sees fresh command      no prompt shown                    stale intent runs

After

User                         Archon                             Workflow engine
----                         ------                             ---------------
/workflow run fix A ----->   creates/dispatches workflow ---->  run fails

/workflow run fix B ----->   finds prior failed run
                             [shows 3-option prompt]
                             returns without hydrate/execute
User chooses:
  /workflow resume <id> -->  [resumeRunId bypass] ----------->  hydrateResumableRun owns CAS resume
  /workflow abandon <id> ->  failed -> cancelled
  /workflow run fix B --force [skip lookup] ----------------->  starts fresh run with B

Architecture Diagram

Before

command-handler
  | CommandResult.workflow { definition, args }
  v
orchestrator-agent dispatchOrchestratorWorkflow
  | findResumableRunByParentConversation(failed|paused)
  | hydrateResumableRun(any candidate)
  v
workflows executor / dag executor
  | uses persisted workflowRun.user_message
  v
workflow db

workflow-operations abandonWorkflow
  | rejects all terminal statuses, including failed
  v
workflow db

After

command-handler [~]
  | CommandResult.workflow { definition, args, force, resumeRunId }
  v
orchestrator-agent dispatchOrchestratorWorkflow [~]
  | [force skips lookup]
  | findResumableRunByParentConversation(failed|paused)
  | [failed + no matching resumeRunId -> 3-option prompt, return]
  | hydrateResumableRun(paused or explicit resume)
  v
workflows executor / dag executor
  | unchanged CAS-owned resume and persisted run hydration
  v
workflow db

workflow-operations abandonWorkflow [~]
  | rejects completed/cancelled only; failed -> cancelled
  v
workflow db

Connection inventory (list every module-to-module edge, mark changes):

From	To	Status	Notes
`packages/core/src/handlers/command-handler.ts`	`packages/core/src/types/index.ts`	modified	Workflow command payload now carries `force` and `resumeRunId` intent.
`packages/core/src/orchestrator/orchestrator-agent.ts`	`packages/core/src/db/workflows.ts`	unchanged	Still uses `findResumableRunByParentConversation`; caller now gates failed rows before hydration.
`packages/core/src/orchestrator/orchestrator-agent.ts`	`packages/workflows/src/executor.ts`	modified	Dispatch only passes `resumeRunId`-authorized or paused candidates into hydration.
`packages/core/src/orchestrator/orchestrator-agent.ts`	platform response sending	modified	Fresh run with prior failed run now sends the verbatim 3-option prompt and returns.
`packages/core/src/operations/workflow-operations.ts`	workflow DB update path	modified	Failed runs can be abandoned/cancelled; completed and cancelled remain rejected.
`packages/core/src/*/.test.ts`	command/orchestrator/operation modules	modified	Added regression coverage for all issue acceptance paths.

Label Snapshot

Risk: risk: medium
Size: size: M
Scope: core|tests
Module: core:orchestrator, core:command-handler, core:workflow-operations

Change Metadata

Change type: bug
Primary scope: core

Linked Issue

Validation Evidence (required)

Commands and result summary:

bun run type-check
bun run lint
bun run format:check
bun run test
bun run validate

Evidence provided (test/log/trace/screenshot): Run artifact validation reports type-check passed, lint passed with 0 warnings, format check passed, bun run test passed with 4834 passed / 0 failed, build passed, and bun run validate passed.
If any command is intentionally skipped, explain why: None skipped. The validation artifact notes root bun test was not used as the authoritative command because this repo documents that root bun test causes Bun mock.module() pollution; bun run test is the supported isolated test entrypoint and passed.

Security Impact (required)

New permissions/capabilities? (Yes/No) No
New external network calls? (Yes/No) No
Secrets/tokens handling changed? (Yes/No) No
File system access scope changed? (Yes/No) No
If any Yes, describe risk and mitigation: No security-sensitive behavior changes were introduced.

Compatibility / Migration

Backward compatible? (Yes/No) Yes
Config/env changes? (Yes/No) No
Database migration needed? (Yes/No) No
If yes, exact upgrade steps: No migration or operator action required.

Human Verification (required)

What was personally validated beyond CI:

Verified scenarios: Fresh /workflow run with a prior failed run prompts and does not call executeWorkflow; --force bypasses the lookup and starts fresh; /workflow resume <id> dispatches with resumeRunId; natural-language approval resumes without prompting; paused auto-resume remains unchanged; failed abandon transitions to cancelled.
Edge cases checked: --force can appear anywhere after the workflow name and is stripped from workflow args; completed and cancelled runs still cannot be abandoned; the failed-run prompt escapes backslash, double quote, and backtick in the suggested command and truncates the prior user message preview.
What was not verified: No manual end-to-end UI screenshot was captured; coverage is through focused command/orchestrator/operation regression tests plus full repository validation.

Side Effects / Blast Radius (required)

Affected subsystems/workflows: Core workflow command handling, orchestrator workflow dispatch, workflow abandon operation, and related tests.
Potential unintended effects: Users with a failed run will now see a prompt instead of implicit failed-run resume when issuing a fresh /workflow run; automation that depended on implicit failed auto-resume should switch to /workflow resume <id>.
Guardrails/monitoring for early detection: Regression tests assert no executeWorkflow call occurs on the prompt path and that explicit resume paths still hydrate through the existing CAS model.

Rollback Plan (required)

Fast rollback command/path: Revert commit cde8b12b96b98b54d3a99de504898a8f801a2e50.
Feature flags or config toggles (if any): None.
Observable failure symptoms: /workflow run prompts unexpectedly for failed candidates, explicit /workflow resume <id> loops back to a prompt, natural-language approval fails to resume, or failed runs cannot be abandoned.

Risks and Mitigations

List real risks in this PR (or write None).

Risk: Explicit resume and paused approval paths could be mistaken for fresh-run intent and prompt-loop.
- Mitigation: resumeRunId bypasses the failed-run prompt, and tests cover /workflow resume <id> plus natural-language approval redispatch.
Risk: The fix could accidentally bypass the existing CAS guard by pre-transitioning runs.
- Mitigation: No pre-transition is added; hydration remains before execution and owns the CAS transition through the current WorkflowNotResumableError path.
Risk: The new prompt could make stale failed runs harder to clear.
- Mitigation: abandonWorkflow now accepts failed runs and transitions them to cancelled so the prompt can be cleared intentionally.

Fresh /workflow run commands could silently resume a prior failed run and execute stale persisted args. Gate failed candidates behind explicit user intent while preserving paused approval resume and CAS-owned hydration. Changes: - Add force and resumeRunId intent to workflow command dispatch - Prompt before hydrating failed resume candidates unless explicitly resumed - Parse /workflow run --force and allow abandoning failed runs - Add regressions for failed prompts, force, explicit resume, NL approval, and abandon Fixes #1549 Co-authored-by: Zolto <zolto@zhome.local>

coderabbitai · 2026-06-09T10:49:30Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 34f52c67-b5aa-4251-b939-dd1a30ef374c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch archon/task-fix-issue-1549

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Wirasm · 2026-06-09T10:57:39Z

🔍 Comprehensive PR Review

PR: #1935
Reviewed by: 3 available specialized agent artifacts; 2 requested artifacts were missing
Date: 2026-06-09T10:56:15Z

Summary

The PR is well scoped and covers the main stale failed-run gate behavior without pre-transitioning runs to running. The strongest issue is that explicit /workflow resume <id> intent is not preserved all the way through dispatch: the command validates one run, but dispatch can hydrate or prompt from the newest resumable run for the same workflow conversation instead. Two medium issues remain around command suggestion round-tripping and misleading workflow-load feedback during explicit resume.

Verdict: REQUEST_CHANGES

Severity	Count
🔴 CRITICAL	0
🟠 HIGH	1
🟡 MEDIUM	2
🟢 LOW	0

🔴 Critical Issues

None.

🟠 High Issues (Auto-fixing)

Explicit Resume Can Hydrate Or Prompt For The Wrong Run

📍 packages/core/src/orchestrator/orchestrator-agent.ts:546

/workflow resume <id> validates the requested run in the command handler, but dispatch later calls findResumableRunByParentConversation(workflow.name, conversation.id, codebase.id). That helper returns the latest resumable run for the workflow conversation, not necessarily the requested resumeRunId. If multiple failed or paused runs exist, explicit resume can prompt for a newer failed run or hydrate a different paused run than the user selected.

View recommended fix

interface WorkflowDispatchOptions {
  force?: boolean;
  resumeRun?: WorkflowRun;
}

const resumableRun = options?.force
  ? null
  : options?.resumeRun ??
    (await workflowDb.findResumableRunByParentConversation(
      workflow.name,
      conversation.id,
      codebase.id
    ));

if (options?.resumeRun && options.resumeRun.status !== 'paused' && !options.resumeRun.working_path) {
  await platform.sendMessage(
    conversationId,
    `Cannot resume ${options.resumeRun.id}: missing working path.`
  );
  return;
}

Then set resumeRun: run in the /workflow resume result path and use the approved run object for natural-language approval dispatch. Add a regression test where resumeRunId is old-run but the broad lookup returns new-run, asserting the dispatcher does not hydrate or prompt for new-run.

🟡 Medium Issues (Needs Decision)

Suggested Escaped Commands Do Not Round-Trip Through `parseCommand()`

📍 packages/core/src/orchestrator/orchestrator-agent.ts:555

The failed-run prompt escapes backslash, double quote, and backtick as if the generated command will be parsed by a shell. Archon's parseCommand() uses a regex tokenizer and does not interpret backslash escapes inside quoted strings, so copied suggestions containing escaped quotes can be tokenized incorrectly.

Options: Fix now | Create issue | Skip

View details

Option	Approach	Effort	Risk if Skipped
Fix Now	Replace the regex tokenizer with a small quoted-string parser that supports escaped quotes and backslashes, with regression tests for the generated command.	MED	Users can copy a suggested command that silently changes workflow args.
Create Issue	Defer command parser hardening to a separate PR.	LOW	Prompt UX remains unreliable for messages with quotes or backslashes.
Skip	Accept as-is.	NONE	Generated commands remain non-deterministic for common text inputs.

Recommendation: Fix now if the prompt stays copyable. Otherwise, remove arbitrary quoted prompt text from the suggestion and tell users to re-run their original command with --force.

`/workflow resume` Hides Workflow Load Errors Behind "Not Found"

📍 packages/core/src/handlers/command-handler.ts:699

The explicit resume path calls discoverWorkflowsWithConfig() but keeps only result.workflows. Per-file workflow parse/read failures are returned in result.errors, not thrown. If the run's workflow exists but currently fails to parse or read, /workflow resume <id> reports that the workflow was not found instead of surfacing the actionable load error.

Options: Fix now | Create issue | Skip

View details

Option	Approach	Effort	Risk if Skipped
Fix Now	Preserve `result.errors` in the resume path and mirror `/workflow run` matching before returning "not found".	LOW	Users get the wrong recovery path and may not know a workflow file is broken.
Create Issue	Defer the UX consistency fix.	LOW	Resume remains inconsistent with run when workflow loading partially fails.
Skip	Accept misleading fallback.	NONE	Debugging resume failures requires inspecting logs or running other commands.

Recommendation: Fix now. This is a localized change and follows an existing command-handler pattern.

🟢 Low Issues

None reported in available artifacts.

✅ What's Good

Failed-run auto-resume is gated before hydrateResumableRun(), preserving the no-pre-transition resume model.
/workflow run --force is parsed anywhere after the workflow name, stripped from workflow args, and skips resumable lookup.
Failed-run abandon support is narrowly implemented while completed and cancelled runs remain rejected.
Tests cover the main happy and guard paths: failed prompt, force fresh dispatch, natural-language approval, explicit resume happy path, paused auto-resume, and failed-run abandon.
Resume CAS handling remains specific to WorkflowNotResumableError and does not fall through to fresh execution.

📋 Suggested Follow-up Issues

Issue Title	Priority	Related Finding
Make slash command parsing round-trip escaped quoted suggestions	P2	Medium issue #1
Surface workflow discovery load errors during explicit resume	P2	Medium issue #2

Review Coverage Note

The review directory contained these artifacts: code-review-findings.md, error-handling-findings.md, and test-coverage-findings.md. The requested comment-quality-findings.md and docs-impact-findings.md artifacts were not present, so this synthesis is based on the available three reports.

Next Steps

⚡ Auto-fix step should address the HIGH explicit-resume identity issue.
📝 Review the MEDIUM issues above.
🎯 Re-run focused tests for orchestrator dispatch and command handler resume behavior.

Reviewed by Archon comprehensive-pr-review workflow
Artifacts: /Users/rasmus/.archon/workspaces/coleam00/Archon/artifacts/runs/8d4bf8f85cdcac7542fd75849aee3e0f/review/

Fixed: - Preserve explicit workflow resume identity through dispatch - Parse escaped quoted slash-command arguments - Surface workflow load errors during explicit resume Tests added: - Parser regression coverage for escaped quoted suggestions - Resume load-error feedback coverage - Explicit resume identity and missing working path coverage Skipped: - none

Wirasm · 2026-06-09T11:04:30Z

⚡ Self-Fix Report (Aggressive)

Status: COMPLETE
Pushed: ✅ Changes pushed to archon/task-fix-issue-1549
Commit: e097004679c1275853f0d14a3b8e8dff1d9db15d
Philosophy: Fix everything unless clearly a new concern

Fixes Applied (3 total)

Severity	Count
🔴 CRITICAL	0
🟠 HIGH	1
🟡 MEDIUM	2
🟢 LOW	0

View all fixes

✅ Explicit Resume Can Hydrate Or Prompt For The Wrong Run (packages/core/src/orchestrator/orchestrator-agent.ts:546) — carried the validated run through dispatch and hydrate the requested run directly.
✅ Suggested Escaped Commands Do Not Round-Trip Through parseCommand() (packages/core/src/handlers/command-handler.ts:179) — added quoted-string escape parsing and regression tests.
✅ /workflow resume Hides Workflow Load Errors Behind Not Found (packages/core/src/handlers/command-handler.ts:699) — preserved discovery load errors and surfaced targeted YAML/load feedback.

Tests Added

packages/core/src/orchestrator/orchestrator-agent.test.ts: explicit resume identity and missing working path coverage.
packages/core/src/handlers/command-handler.test.ts: escaped quote parsing and resume load-error feedback coverage.

Skipped (0)

(none — all findings addressed)

Suggested Follow-up Issues

(none)

Validation

✅ Type check | ✅ Lint | ✅ Tests (bun run test) | ✅ Format check

Self-fix by Archon · aggressive mode · fixes pushed to archon/task-fix-issue-1549

simplify: reduce complexity in changed files

bb50c6c

uinstinct mentioned this pull request Jun 9, 2026

docs(book): fix nested codeblock fence in first-command step 3 #1936

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gate stale workflow resumes#1935

Gate stale workflow resumes#1935
Wirasm wants to merge 3 commits into
devfrom
archon/task-fix-issue-1549

Wirasm commented Jun 9, 2026

Uh oh!

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

Review skipped

Uh oh!

Wirasm commented Jun 9, 2026

Uh oh!

Wirasm commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Wirasm commented Jun 9, 2026

Summary

UX Journey

Before

After

Architecture Diagram

Before

After

Label Snapshot

Change Metadata

Linked Issue

Validation Evidence (required)

Security Impact (required)

Compatibility / Migration

Human Verification (required)

Side Effects / Blast Radius (required)

Rollback Plan (required)

Risks and Mitigations

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Wirasm commented Jun 9, 2026

🔍 Comprehensive PR Review

Summary

🔴 Critical Issues

🟠 High Issues (Auto-fixing)

Explicit Resume Can Hydrate Or Prompt For The Wrong Run

🟡 Medium Issues (Needs Decision)

Suggested Escaped Commands Do Not Round-Trip Through parseCommand()

/workflow resume Hides Workflow Load Errors Behind "Not Found"

🟢 Low Issues

✅ What's Good

📋 Suggested Follow-up Issues

Review Coverage Note

Next Steps

Uh oh!

Wirasm commented Jun 9, 2026

⚡ Self-Fix Report (Aggressive)

Fixes Applied (3 total)

Tests Added

Skipped (0)

Suggested Follow-up Issues

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

Suggested Escaped Commands Do Not Round-Trip Through `parseCommand()`

`/workflow resume` Hides Workflow Load Errors Behind "Not Found"