Skip to content

bug(workflows): auto-resume after reject-with-redraft bypasses the approval gate and posts #1429

@Wirasm

Description

@Wirasm

Summary

After an approval node is rejected with on_reject.prompt (which redrafts an artifact and re-pauses at the same approval gate per the documented behavior), invoking archon workflow run <name> a second time auto-resumes the paused run and treats the post-redraft pause as approved, executing the downstream nodes without re-asking the user. In a workflow whose post-approval node has real-world side effects (e.g. gh pr comment), this means the side effect fires when the user expected another approval pause.

Reproduction

I hit this today in the maintainer-review-pr workflow on dev:

  1. archon workflow run maintainer-review-pr \"1335\" — workflow ran through gate (verdict: decline), paused at approve-decline approval node.
  2. archon workflow reject db81f7043abb2b1d07eb80f409ba7962 --reason \"PR was closed manually before the comment posted. Redraft to acknowledge the closure ...\"on_reject.prompt ran, regenerated decline-comment.md, workflow re-paused at approve-decline (per docs: "After running, the workflow re-pauses at the same gate").
  3. archon workflow run maintainer-review-pr \"1369\" — intent was to start a fresh run on a different PR.
  4. Actual: the second invocation auto-resumed run db81f7043abb.... The approve-decline node showed as dag.node_skipped_prior_success. post-decline (which executes gh pr comment) ran. The comment was posted to PR feat: add MD Quick View Electron desktop app for viewing .md files #1335 without the user being asked again. The \"1369\" argument was ignored.

Run record:

nodeId: approve-decline ...  msg: dag.node_skipped_prior_success
nodeId: post-decline ... durationMs: 2547 ... msg: dag_node_completed

Expected behavior

One of:

  • (A) The approval node re-pauses on resume. The on_reject redraft re-paused at the gate; that pause should be honored on the next resume too. Approval state should not be cached as 'approved' from the prior cycle.
  • (B) archon workflow run <name> <new-args> should create a new run, not silently auto-resume a paused one with a different argument. At minimum, when the new args differ from the paused run's args, refuse to auto-resume and ask the user explicitly (via a CLI prompt or by requiring --resume <run-id> to opt in).

Either behavior would prevent the foot-gun.

Impact

  • Real-world side effects fire without explicit approval. In our case, a gh pr comment posted to a closed third-party PR. In a destructive workflow (auto-merge, delete-branch, etc.) this could be much worse.
  • Surprising for users — the documented contract is "approval pauses the workflow until a human approves or rejects." Auto-resume eliding that for prior-success runs violates the contract.

Suggested fix paths

  • Cleanest: when an approval node is reached on resume, re-evaluate the approval state from scratch (paused → wait for user) instead of treating any prior 'approved' marker as final. The redraft cycle should reset the approval to 'pending'.
  • Alternative: explicit --resume <run-id> flag required to resume; bare archon workflow run <name> always creates a new run. (More disruptive change.)

Related

  • Documented approval semantics: workflow-dag.md, Approval Nodes section.
  • The auto-resume feature itself is documented as "resume from last failure point," but the interaction with approval re-pauses isn't spelled out.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething is broken

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions