Skip to content

archon workflow run auto-resumes failed runs without --resume — docs say opt-in #1392

@Wirasm

Description

@Wirasm

Problem

The WorkflowRunOptions doc-comment (`packages/cli/src/commands/workflow.ts`) says:

--resume: reuse worktree from last failed run.

…implying --resume is opt-in. The CLI usage referenced in references/cli-commands.md and the README also presents --resume as a separate, explicit subcommand-style flag.

Observed behavior contradicts this: archon workflow run <name> \"<message>\" (no --resume) silently auto-resumes the latest failed run with the same workflow name, replaying cached node outputs.

Repro

# Run 1 — produces a failure
archon workflow run my-workflow \"...\"   # fails at node-3

# Run 2 — no --resume flag
archon workflow run my-workflow \"...\"   # observe:
#   workflow.dag_resuming
#   workflowRunId is the SAME as run 1
#   priorCompletedCount: 2  (parse-args, node-2)
#   node-1 Skipped (prior_success)
#   node-2 Skipped (prior_success)
#   node-3 re-runs

Concretely: the run id stays constant across archon workflow run invocations as long as the prior run is in a non-terminal-not-completed state. Both runs are observably the same DB row; the second invocation isn't a fresh run.

Why this is confusing

For agents (and humans) authoring workflows:

  • They run, hit a bug, fix the bug, re-run. They expect a clean slate. They get a resume.
  • They expect their parse-args change to take effect. It doesn't (cached prior_success).
  • They expect to see the dynamic-args path execute. They see the cached output from the prior args.
  • They cd into a different state and re-run. They get the prior run's worktree path, not their current one.
  • Combined with issue E (no way to invalidate cached node output), debugging cycles compound: a fix doesn't surface because of cached upstream state, and there's no obvious flag to disable resume.

The auto-resume is a great feature. The problem is it's not labelled, not documented as default, and there's no --no-resume to opt out.

What the docs say

packages/cli/src/commands/workflow.ts:

 * Default: creates worktree with auto-generated branch name (isolation by default).
 * --branch: explicit branch name for the worktree.
 * --no-worktree: opt out of isolation, run in live checkout.
 * --resume: reuse worktree from last failed run.        ← reads as opt-in
 * --from: override base branch (start-point for worktree).

references/cli-commands.md (Archon skill):

# Resume a failed workflow (re-runs, skipping completed nodes)
archon workflow resume <run-id>

…suggesting resume is an explicit resume subcommand or --resume flag.

Proposed direction

Pick one of:

  1. Behavior matches docsarchon workflow run defaults to a fresh run. Auto-resume only when --resume is passed (or when the user runs archon workflow resume <id>).
  2. Docs match behavior — document that archon workflow run auto-resumes any non-completed prior run for this workflow on this codebase, and add a --no-resume (or --fresh) flag to opt out.

Either is fine; the current state — silent auto-resume contradicting the docs — is the worst combination. (1) is probably the safer default since fresh-by-default matches CLI conventions almost everywhere; (2) requires less code change.

Found while

Building a workshop archon-release workflow. Spent ~2 iterations confused about why my workflow-YAML edits weren't taking effect — the answer was "they were, but the cached prior-success nodes were unchanged." See #1389 and the related cache-invalidation issue for adjacent papercuts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium priority - Backlog, when time permitsarea: cliCLI commands and interfacearea: workflowsWorkflow enginebugSomething is brokeneffort/mediumFew files, one domain or module, some coordination needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions