Gate stale workflow resumes#1935
Conversation
Fresh /workflow run commands could silently resume a prior failed run and execute stale persisted args. Gate failed candidates behind explicit user intent while preserving paused approval resume and CAS-owned hydration. Changes: - Add force and resumeRunId intent to workflow command dispatch - Prompt before hydrating failed resume candidates unless explicitly resumed - Parse /workflow run --force and allow abandoning failed runs - Add regressions for failed prompts, force, explicit resume, NL approval, and abandon Fixes #1549 Co-authored-by: Zolto <zolto@zhome.local>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Comprehensive PR ReviewPR: #1935 SummaryThe PR is well scoped and covers the main stale failed-run gate behavior without pre-transitioning runs to Verdict:
🔴 Critical IssuesNone. 🟠 High Issues (Auto-fixing)Explicit Resume Can Hydrate Or Prompt For The Wrong Run📍
View recommended fixinterface WorkflowDispatchOptions {
force?: boolean;
resumeRun?: WorkflowRun;
}
const resumableRun = options?.force
? null
: options?.resumeRun ??
(await workflowDb.findResumableRunByParentConversation(
workflow.name,
conversation.id,
codebase.id
));
if (options?.resumeRun && options.resumeRun.status !== 'paused' && !options.resumeRun.working_path) {
await platform.sendMessage(
conversationId,
`Cannot resume ${options.resumeRun.id}: missing working path.`
);
return;
}Then set 🟡 Medium Issues (Needs Decision)Suggested Escaped Commands Do Not Round-Trip Through
|
| Option | Approach | Effort | Risk if Skipped |
|---|---|---|---|
| Fix Now | Replace the regex tokenizer with a small quoted-string parser that supports escaped quotes and backslashes, with regression tests for the generated command. | MED | Users can copy a suggested command that silently changes workflow args. |
| Create Issue | Defer command parser hardening to a separate PR. | LOW | Prompt UX remains unreliable for messages with quotes or backslashes. |
| Skip | Accept as-is. | NONE | Generated commands remain non-deterministic for common text inputs. |
Recommendation: Fix now if the prompt stays copyable. Otherwise, remove arbitrary quoted prompt text from the suggestion and tell users to re-run their original command with --force.
/workflow resume Hides Workflow Load Errors Behind "Not Found"
📍 packages/core/src/handlers/command-handler.ts:699
The explicit resume path calls discoverWorkflowsWithConfig() but keeps only result.workflows. Per-file workflow parse/read failures are returned in result.errors, not thrown. If the run's workflow exists but currently fails to parse or read, /workflow resume <id> reports that the workflow was not found instead of surfacing the actionable load error.
Options: Fix now | Create issue | Skip
View details
| Option | Approach | Effort | Risk if Skipped |
|---|---|---|---|
| Fix Now | Preserve result.errors in the resume path and mirror /workflow run matching before returning "not found". |
LOW | Users get the wrong recovery path and may not know a workflow file is broken. |
| Create Issue | Defer the UX consistency fix. | LOW | Resume remains inconsistent with run when workflow loading partially fails. |
| Skip | Accept misleading fallback. | NONE | Debugging resume failures requires inspecting logs or running other commands. |
Recommendation: Fix now. This is a localized change and follows an existing command-handler pattern.
🟢 Low Issues
None reported in available artifacts.
✅ What's Good
- Failed-run auto-resume is gated before
hydrateResumableRun(), preserving the no-pre-transition resume model. /workflow run --forceis parsed anywhere after the workflow name, stripped from workflow args, and skips resumable lookup.- Failed-run abandon support is narrowly implemented while completed and cancelled runs remain rejected.
- Tests cover the main happy and guard paths: failed prompt, force fresh dispatch, natural-language approval, explicit resume happy path, paused auto-resume, and failed-run abandon.
- Resume CAS handling remains specific to
WorkflowNotResumableErrorand does not fall through to fresh execution.
📋 Suggested Follow-up Issues
| Issue Title | Priority | Related Finding |
|---|---|---|
| Make slash command parsing round-trip escaped quoted suggestions | P2 | Medium issue #1 |
| Surface workflow discovery load errors during explicit resume | P2 | Medium issue #2 |
Review Coverage Note
The review directory contained these artifacts: code-review-findings.md, error-handling-findings.md, and test-coverage-findings.md. The requested comment-quality-findings.md and docs-impact-findings.md artifacts were not present, so this synthesis is based on the available three reports.
Next Steps
- ⚡ Auto-fix step should address the HIGH explicit-resume identity issue.
- 📝 Review the MEDIUM issues above.
- 🎯 Re-run focused tests for orchestrator dispatch and command handler resume behavior.
Reviewed by Archon comprehensive-pr-review workflow
Artifacts: /Users/rasmus/.archon/workspaces/coleam00/Archon/artifacts/runs/8d4bf8f85cdcac7542fd75849aee3e0f/review/
Fixed: - Preserve explicit workflow resume identity through dispatch - Parse escaped quoted slash-command arguments - Surface workflow load errors during explicit resume Tests added: - Parser regression coverage for escaped quoted suggestions - Resume load-error feedback coverage - Explicit resume identity and missing working path coverage Skipped: - none
⚡ Self-Fix Report (Aggressive)Status: COMPLETE Fixes Applied (3 total)
View all fixes
Tests Added
Skipped (0)(none — all findings addressed) Suggested Follow-up Issues(none) Validation✅ Type check | ✅ Lint | ✅ Tests ( Self-fix by Archon · aggressive mode · fixes pushed to |
Summary
Describe this PR in 2-5 bullets:
/workflow run <name> ...commands could silently resume a prior failed run for the same workflow/conversation/codebase and execute the failed run's staleuser_messageinstead of the new request./workflow run --forceanywhere in args, and allowed failed runs to be abandoned tocancelled.findResumableRunByParentConversationlookup,hydrateResumableRunresume model, andWorkflowNotResumableErrorCAS guards remain the owner of the failed/paused -> running transition.UX Journey
Before
After
Architecture Diagram
Before
After
Connection inventory (list every module-to-module edge, mark changes):
packages/core/src/handlers/command-handler.tspackages/core/src/types/index.tsforceandresumeRunIdintent.packages/core/src/orchestrator/orchestrator-agent.tspackages/core/src/db/workflows.tsfindResumableRunByParentConversation; caller now gates failed rows before hydration.packages/core/src/orchestrator/orchestrator-agent.tspackages/workflows/src/executor.tsresumeRunId-authorized or paused candidates into hydration.packages/core/src/orchestrator/orchestrator-agent.tspackages/core/src/operations/workflow-operations.tspackages/core/src/**/*.test.tsLabel Snapshot
risk: mediumsize: Mcore|testscore:orchestrator,core:command-handler,core:workflow-operationsChange Metadata
bugcoreLinked Issue
Validation Evidence (required)
Commands and result summary:
bun run type-check bun run lint bun run format:check bun run test bun run validatebun run testpassed with 4834 passed / 0 failed, build passed, andbun run validatepassed.bun testwas not used as the authoritative command because this repo documents that rootbun testcauses Bunmock.module()pollution;bun run testis the supported isolated test entrypoint and passed.Security Impact (required)
Yes/No) NoYes/No) NoYes/No) NoYes/No) NoYes, describe risk and mitigation: No security-sensitive behavior changes were introduced.Compatibility / Migration
Yes/No) YesYes/No) NoYes/No) NoHuman Verification (required)
What was personally validated beyond CI:
/workflow runwith a prior failed run prompts and does not callexecuteWorkflow;--forcebypasses the lookup and starts fresh;/workflow resume <id>dispatches withresumeRunId; natural-language approval resumes without prompting; paused auto-resume remains unchanged; failed abandon transitions to cancelled.--forcecan appear anywhere after the workflow name and is stripped from workflow args; completed and cancelled runs still cannot be abandoned; the failed-run prompt escapes backslash, double quote, and backtick in the suggested command and truncates the prior user message preview.Side Effects / Blast Radius (required)
/workflow run; automation that depended on implicit failed auto-resume should switch to/workflow resume <id>.executeWorkflowcall occurs on the prompt path and that explicit resume paths still hydrate through the existing CAS model.Rollback Plan (required)
cde8b12b96b98b54d3a99de504898a8f801a2e50./workflow runprompts unexpectedly for failed candidates, explicit/workflow resume <id>loops back to a prompt, natural-language approval fails to resume, or failed runs cannot be abandoned.Risks and Mitigations
List real risks in this PR (or write
None).resumeRunIdbypasses the failed-run prompt, and tests cover/workflow resume <id>plus natural-language approval redispatch.WorkflowNotResumableErrorpath.abandonWorkflownow accepts failed runs and transitions them tocancelledso the prompt can be cleared intentionally.