fix: preserve working_dir on retry restore#2111
Conversation
📝 WalkthroughWalkthroughMarks DAGs whose working_dir was explicitly configured and propagates that flag through YAML loading and DAG restoration; applies persisted run WorkingDir on restore, backfills legacy retry WorkingDir, threads WorkingDir through persistence/status/runtime/plan/runner, and adds tests covering loader, restore, retry, transform, and integration scenarios. ChangesWorking Directory Preservation During Retry
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
c7a5ffa to
4737467
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
4737467 to
f556687
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
f556687 to
e2edfc5
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal/core/spec/loader.go (1)
380-399: ⚡ Quick winStale doc comment — now describes only half of the function's behavior.
The comment on line 380 says "sets a working directory when the manifest omits one" but the function also marks explicitly-configured directories. Consider updating it.
📝 Suggested doc update
-// applyWorkingDirFallback sets a working directory when the manifest omits one. +// applyWorkingDirFallback marks any already-configured working directory as +// explicit and, when none is configured, synthesizes a fallback from filePath +// or the process working directory. func applyWorkingDirFallback(dag *core.DAG, filePath string) error {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/core/spec/loader.go` around lines 380 - 399, Update the stale doc comment for applyWorkingDirFallback to reflect both behaviors: it marks explicitly-configured working directories (via markConfiguredWorkingDirsExplicit) and then sets a default working directory when none is provided on the DAG (using filePath or getDefaultWorkingDir()). Mention both steps and the error-return case from getDefaultWorkingDir so the comment accurately documents applyWorkingDirFallback's full behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@internal/core/spec/loader.go`:
- Around line 380-399: Update the stale doc comment for applyWorkingDirFallback
to reflect both behaviors: it marks explicitly-configured working directories
(via markConfiguredWorkingDirsExplicit) and then sets a default working
directory when none is provided on the DAG (using filePath or
getDefaultWorkingDir()). Mention both steps and the error-return case from
getDefaultWorkingDir so the comment accurately documents
applyWorkingDirFallback's full behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a25caba3-1997-4afd-9d28-c6dfe8f1823b
📒 Files selected for processing (4)
internal/cmd/helper_test.gointernal/cmd/retry_test.gointernal/core/spec/loader.gointernal/core/spec/loader_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
- internal/core/spec/loader_test.go
- internal/cmd/helper_test.go
- internal/cmd/retry_test.go
e2edfc5 to
0d3d272
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
0d3d272 to
1eb4a51
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
1eb4a51 to
721f112
Compare
|
@coderabbitai review |
|
@coderabbitai resume |
721f112 to
058423e
Compare
|
@coderabbitai review |
058423e to
6d9ee76
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
2 similar comments
✅ Actions performedReview triggered.
|
✅ Actions performedReview triggered.
|
6d9ee76 to
feab602
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal/runtime/plan_test.go (1)
508-531: 💤 Low valueLGTM — the test correctly captures the core regression scenario: a persisted
NodeState.WorkingDirmust override the current DAG step'sDirwhen building the retry plan.One minor coverage gap: the test only covers the case where
node.Step.Diris empty. Consider adding a sub-case wherenode.Step.Diris also non-empty to document the explicit priority order.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/runtime/plan_test.go` around lines 508 - 531, Add a sub-case to TestCreateRetryPlan_PrefersPersistedNodeWorkingDir that sets both the DAG Step.Dir and the persisted node.Step.Dir to non-empty values and asserts that runtime.CreateRetryPlan still prefers the persisted NodeState.WorkingDir (e.g., create node via runtime.NodeWithData with NodeData containing Step:{Name:"target", Dir:"/node-specified"} and NodeState.WorkingDir:"/original-work", call runtime.CreateRetryPlan, then assert the returned plan's target.Step().Dir equals "/original-work"); this documents the explicit priority order and uses the same helpers (NodeWithData, NodeData, NodeState, core.Step) referenced in the existing test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@internal/runtime/plan_test.go`:
- Around line 508-531: Add a sub-case to
TestCreateRetryPlan_PrefersPersistedNodeWorkingDir that sets both the DAG
Step.Dir and the persisted node.Step.Dir to non-empty values and asserts that
runtime.CreateRetryPlan still prefers the persisted NodeState.WorkingDir (e.g.,
create node via runtime.NodeWithData with NodeData containing
Step:{Name:"target", Dir:"/node-specified"} and
NodeState.WorkingDir:"/original-work", call runtime.CreateRetryPlan, then assert
the returned plan's target.Step().Dir equals "/original-work"); this documents
the explicit priority order and uses the same helpers (NodeWithData, NodeData,
NodeState, core.Step) referenced in the existing test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 75095cba-4708-414b-8a63-bee09527360c
📒 Files selected for processing (20)
internal/cmd/helper.gointernal/cmd/helper_test.gointernal/cmd/retry.gointernal/cmd/retry_test.gointernal/core/exec/node.gointernal/core/exec/runstatus.gointernal/core/spec/loader.gointernal/core/spec/loader_test.gointernal/intg/retry_test.gointernal/runtime/agent/agent.gointernal/runtime/data.gointernal/runtime/plan.gointernal/runtime/plan_test.gointernal/runtime/runner.gointernal/runtime/transform/node.gointernal/runtime/transform/node_test.gointernal/runtime/transform/status.gointernal/runtime/transform/status_test.gointernal/service/slack/bot.gointernal/service/slack/bot_test.go
✅ Files skipped from review due to trivial changes (1)
- internal/runtime/transform/node.go
🚧 Files skipped from review as they are similar to previous changes (3)
- internal/core/spec/loader_test.go
- internal/cmd/retry_test.go
- internal/core/spec/loader.go
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal/service/slack/bot.go (1)
111-121: 💤 Low valueConsider initialising
incomingAfterFuncinNewto avoid a per-call closure allocation.In the production path every
enqueueIncomingMessagecall allocates a freshfunc(time.Duration, func())wrapper just to calltime.AfterFunc. Setting the default once inNewremoves the nil-guard and the allocation entirely, while test code can still override the field directly.♻️ Proposed refactor
return &Bot{ cfg: cfg, agentAPI: agentAPI, slackClient: slackClient, socketClient: socketClient, allowedChannels: allowed, eventService: cfg.EventService, notificationStateFile: cfg.NotificationStateFile, logger: logger, incomingDelay: defaultIncomingBatchDelay, + incomingAfterFunc: time.AfterFunc, }, nil- afterFunc := b.incomingAfterFunc - if afterFunc == nil { - afterFunc = func(delay time.Duration, fn func()) { - time.AfterFunc(delay, fn) - } - } - afterFunc(delay, func() { + b.incomingAfterFunc(delay, func() { b.flushIncomingMessages(ctx, cs, convKey, gen) })Also applies to: 330-338
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/service/slack/bot.go` around lines 111 - 121, The Bot constructor (New) should initialize the incomingAfterFunc field to time.AfterFunc (and likewise initialize any similar timers mentioned around enqueueOutgoingMessage) instead of leaving it nil so enqueueIncomingMessage doesn't allocate a new closure on every call; update New to set b.incomingAfterFunc = time.AfterFunc, remove the per-call nil-guard/closure allocation in enqueueIncomingMessage, and allow tests to override the incomingAfterFunc field directly (same change applies to the analogous outgoing timer field referenced in the review).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@internal/service/slack/bot.go`:
- Around line 111-121: The Bot constructor (New) should initialize the
incomingAfterFunc field to time.AfterFunc (and likewise initialize any similar
timers mentioned around enqueueOutgoingMessage) instead of leaving it nil so
enqueueIncomingMessage doesn't allocate a new closure on every call; update New
to set b.incomingAfterFunc = time.AfterFunc, remove the per-call
nil-guard/closure allocation in enqueueIncomingMessage, and allow tests to
override the incomingAfterFunc field directly (same change applies to the
analogous outgoing timer field referenced in the review).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 80fda807-ff46-4b6f-b9b0-ecc992f4509f
📒 Files selected for processing (21)
internal/cmd/helper.gointernal/cmd/helper_test.gointernal/cmd/retry.gointernal/cmd/retry_test.gointernal/core/exec/node.gointernal/core/exec/runstatus.gointernal/core/spec/loader.gointernal/core/spec/loader_test.gointernal/intg/retry_test.gointernal/intg/sched_test.gointernal/runtime/agent/agent.gointernal/runtime/data.gointernal/runtime/plan.gointernal/runtime/plan_test.gointernal/runtime/runner.gointernal/runtime/transform/node.gointernal/runtime/transform/node_test.gointernal/runtime/transform/status.gointernal/runtime/transform/status_test.gointernal/service/slack/bot.gointernal/service/slack/bot_test.go
✅ Files skipped from review due to trivial changes (2)
- internal/runtime/transform/node.go
- internal/cmd/retry_test.go
🚧 Files skipped from review as they are similar to previous changes (15)
- internal/core/exec/runstatus.go
- internal/runtime/data.go
- internal/runtime/transform/status.go
- internal/core/exec/node.go
- internal/service/slack/bot_test.go
- internal/runtime/agent/agent.go
- internal/intg/sched_test.go
- internal/runtime/plan_test.go
- internal/runtime/plan.go
- internal/cmd/retry.go
- internal/runtime/transform/node_test.go
- internal/cmd/helper_test.go
- internal/core/spec/loader_test.go
- internal/core/spec/loader.go
- internal/intg/retry_test.go
✅ Actions performedReviews resumed. |
feab602 to
626d81b
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
1 similar comment
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal/service/slack/bot.go (1)
111-122: 💤 Low valueConsider initialising
incomingAfterFuncto its default inNew()to avoid a per-call closure allocation.The nil-check at line 331 creates a fresh wrapper closure on every production
enqueueIncomingMessagecall. Seeding the field inNew()avoids the repeated allocation and removes the branch from the hot path entirely.♻️ Proposed refactor
return &Bot{ cfg: cfg, agentAPI: agentAPI, slackClient: slackClient, socketClient: socketClient, allowedChannels: allowed, eventService: cfg.EventService, notificationStateFile: cfg.NotificationStateFile, logger: logger, incomingDelay: defaultIncomingBatchDelay, + incomingAfterFunc: func(d time.Duration, fn func()) { + time.AfterFunc(d, fn) + }, }, nilThen simplify the dispatch site:
-afterFunc := b.incomingAfterFunc -if afterFunc == nil { - afterFunc = func(delay time.Duration, fn func()) { - time.AfterFunc(delay, fn) - } -} -afterFunc(delay, func() { +b.incomingAfterFunc(delay, func() { b.flushIncomingMessages(ctx, cs, convKey, gen) })Tests override
bot.incomingAfterFuncafter construction as before;New()always produces a non-nil field, so the nil guard is no longer needed.Also applies to: 330-338
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/service/slack/bot.go` around lines 111 - 122, The Bot constructor (New) should initialize the incomingAfterFunc field to the default implementation to avoid allocating a new closure on every enqueueIncomingMessage call; update New() to set Bot.incomingAfterFunc to the same default timer/after behavior used today so enqueueIncomingMessage no longer needs its nil-check and per-call wrapper allocation, and keep tests able to override bot.incomingAfterFunc as before; ensure you update references to incomingAfterFunc, New, and enqueueIncomingMessage accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@internal/service/slack/bot.go`:
- Around line 111-122: The Bot constructor (New) should initialize the
incomingAfterFunc field to the default implementation to avoid allocating a new
closure on every enqueueIncomingMessage call; update New() to set
Bot.incomingAfterFunc to the same default timer/after behavior used today so
enqueueIncomingMessage no longer needs its nil-check and per-call wrapper
allocation, and keep tests able to override bot.incomingAfterFunc as before;
ensure you update references to incomingAfterFunc, New, and
enqueueIncomingMessage accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8a1b0450-3ce3-4f22-bd97-3a288b6ddd4e
📒 Files selected for processing (22)
internal/cmd/helper.gointernal/cmd/helper_test.gointernal/cmd/retry.gointernal/cmd/retry_internal_test.gointernal/cmd/retry_test.gointernal/core/exec/node.gointernal/core/exec/runstatus.gointernal/core/spec/loader.gointernal/core/spec/loader_test.gointernal/intg/retry_test.gointernal/intg/sched_test.gointernal/runtime/agent/agent.gointernal/runtime/data.gointernal/runtime/plan.gointernal/runtime/plan_test.gointernal/runtime/runner.gointernal/runtime/transform/node.gointernal/runtime/transform/node_test.gointernal/runtime/transform/status.gointernal/runtime/transform/status_test.gointernal/service/slack/bot.gointernal/service/slack/bot_test.go
✅ Files skipped from review due to trivial changes (1)
- internal/runtime/runner.go
🚧 Files skipped from review as they are similar to previous changes (17)
- internal/runtime/transform/status_test.go
- internal/runtime/transform/node.go
- internal/core/exec/node.go
- internal/core/exec/runstatus.go
- internal/runtime/data.go
- internal/intg/sched_test.go
- internal/runtime/agent/agent.go
- internal/runtime/plan_test.go
- internal/cmd/retry.go
- internal/cmd/retry_test.go
- internal/service/slack/bot_test.go
- internal/core/spec/loader_test.go
- internal/runtime/plan.go
- internal/runtime/transform/node_test.go
- internal/cmd/helper.go
- internal/core/spec/loader.go
- internal/cmd/helper_test.go
b25e423 to
164abd6
Compare
164abd6 to
b289ecb
Compare
Summary
Root cause
Retry rebuilt the DAG from the stored YAML snapshot, but the explicit working-dir marker was runtime-only and older run histories did not persist the effective run/node working directories. On retry, a different or absent --default-working-dir could make runtime prefer the new attempt work directory instead of the directory used by the original run.
Step retry had the same reproduction issue at node level: retry planning re-bound nodes to the restored DAG definition, so it needed to prefer the persisted node working directory over both the restored DAG step dir and the node's embedded step snapshot.
A Windows CI failure exposed a separate flaky distributed lease test: concurrent lock contenders can see CreateFile .dagu_lock: Access is denied while another contender is creating/removing the lock directory. That is the same transient lock-conflict class as Windows sharing violations and now retries through the existing dirlock loop.
Testing
Closes #2110