fix(workflows): preserve completed node state across DAG multi-resume cycles#1530
Conversation
… cycles (coleam00#1520) Bug: getCompletedDagNodeOutputs only queried 'node_completed' events, but resumed runs emit 'node_skipped_prior_success' for already-completed nodes. On a second resume, previously completed nodes were re-executed because their skip events were invisible to the query. Fix: - Query both 'node_completed' and 'node_skipped_prior_success' event types in getCompletedDagNodeOutputs so multi-resume correctly identifies all previously completed nodes - Store original node_output in node_skipped_prior_success event data so the output is available for $nodeId.output substitution on next resume - Improve error messaging when all nodes fail: name the specific failed nodes and count skipped downstream nodes instead of the generic 'no successful nodes' message Closes coleam00#1520
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR fixes the multi-resume DAG workflow bug by enabling ChangesMulti-Resume State Preservation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@kagura-agent related to #1520 — DAG multi-resume state preservation. |
|
@kagura-agent related to #1391 — DAG resume state concerns overlap with #1520 fix. |
|
Hi @kagura-agent — thanks for opening this PR. This repository uses a PR template at
Could you fill those out (even briefly)? The template If a section genuinely doesn't apply, just write "N/A" in |
|
Done — filled out all the template sections (UX Journey, Architecture Diagram, Label Snapshot, Change Metadata, etc.). Thanks for pointing that out, @Wirasm! |
Review SummaryVerdict: minor-fixes-needed This PR fixes a real bug in DAG workflow multi-resume by making Suggested fixes
Minor / nice-to-have
Compliments
Reviewed via maintainer-review-pr workflow (Pi/Minimax). Aspects run: code-review, error-handling, test-coverage. |
…age format - Assert node_output field in node_skipped_prior_success event test - Add test for empty-string fallback when node ID has undefined output - Add dedicated test verifying failure message names failing nodes - Update stale comment from 'no successful nodes' to new behavior - Add edge-case test for only node_skipped_prior_success rows (no node_completed)
|
Thanks for the thorough review @Wirasm! Pushed b1c94a1a addressing your feedback: Required fixes — all done:
Nice-to-have — done:
All tests pass (203 dag-executor, 19 workflow-events) + type-check clean. |
Review SummaryVerdict: ready-to-merge This PR fixes a silent data-loss bug in the DAG workflow executor where outputs from previously-skipped nodes were dropped on multi-resume, and improves failure messages to name the specific failing node instead of a generic summary. All review aspects pass — code is clean, error handling is consistent, and tests are adequate. One optional test improvement noted below. Minor / nice-to-have
ComplimentsThe PR is tightly scoped with clear causal connections between the three sub-changes (query filter, output preservation, failure message). The Reviewed via maintainer-review-pr workflow (Pi/Minimax). Aspects run: code-review, error-handling, test-coverage, comment-quality. |
Review SummaryVerdict: ready-to-merge Solid PR — the Blocking issuesNone. Suggested fixes
Minor / nice-to-have
Compliments
Reviewed via maintainer-review-pr workflow (Pi/Minimax). Aspects run: code-review, error-handling, test-coverage. |
… cycles (coleam00#1530) * fix(workflows): preserve completed node state across DAG multi-resume cycles (coleam00#1520) Bug: getCompletedDagNodeOutputs only queried 'node_completed' events, but resumed runs emit 'node_skipped_prior_success' for already-completed nodes. On a second resume, previously completed nodes were re-executed because their skip events were invisible to the query. Fix: - Query both 'node_completed' and 'node_skipped_prior_success' event types in getCompletedDagNodeOutputs so multi-resume correctly identifies all previously completed nodes - Store original node_output in node_skipped_prior_success event data so the output is available for $nodeId.output substitution on next resume - Improve error messaging when all nodes fail: name the specific failed nodes and count skipped downstream nodes instead of the generic 'no successful nodes' message Closes coleam00#1520 * test: address review feedback — assert node_output, test failure message format - Assert node_output field in node_skipped_prior_success event test - Add test for empty-string fallback when node ID has undefined output - Add dedicated test verifying failure message names failing nodes - Update stale comment from 'no successful nodes' to new behavior - Add edge-case test for only node_skipped_prior_success rows (no node_completed)
Summary
node_skipped_prior_successevents; skip events now preserve original output for$nodeId.outputsubstitution; failure messages name specific failed nodesnode_completedevent format unchangedUX Journey
Before
After
Architecture Diagram
Before
After
Connection inventory:
Label Snapshot
risk: lowsize: Score,workflowscore:workflow-events,workflows:dag-executorChange Metadata
bugmultiLinked Issue
Validation Evidence (required)
bun run validatenot run locally; CI covers remaining checksSecurity Impact (required)
NoNoNoNoCompatibility / Migration
Yes— additive query change (reads more event types); newnode_outputfield in skip event data is backward-compatible since existing consumers only checkreasonNoNoHuman Verification (required)
$nodeId.outputsubstitution works across resumes; failure messages name specific failed nodesnode_outputfield (backward compat)Side Effects / Blast Radius (required)
getCompletedDagNodeOutputs()returns more results (includes skip events);node_skipped_prior_successdata payload gainsnode_outputfield; error message format changes for!anyCompletedpathRollback Plan (required)
Risks and Mitigations
node_skipped_prior_successevents (withoutnode_output) return undefined outputpriorCompletedNodes.get(node.id) ?? ""with fallback to empty string; existing consumers only checkreasonfieldSummary by CodeRabbit
Bug Fixes
Tests