executor: fix nil pointer when sub-DAG not in worker local store#2258
Merged
Conversation
When default_execution_mode is set to distributed and a parent DAG without worker_selector uses dag.run to invoke a sub-DAG, the parent gets dispatched to a remote worker. The worker's local DAG store does not contain the parent's sub-DAG definitions, causing rCtx.DB.GetDAG() to either be called on a nil DB or to return a nil DAG — both leading to nil pointer dereference in newSubDAGExecutor. This change: - Adds a nil-check on rCtx.DB before GetDAG - Adds a nil-check on the returned DAG - Both errors wrap exec.ErrDAGNotFound and include a remediation hint pointing users to set worker_selector: local on the parent DAG Fixes dagucloud#2257
📝 WalkthroughWalkthroughThis PR adds validation and error handling to ChangesSubDAG Executor Nil-Safety
🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #2257.
When
default_execution_mode: distributedis set and a parent DAG withoutworker_selectorusesdag.runto invoke a sub-DAG, the parent gets dispatched to a remote worker. The worker's local DAG store does not contain the parent's sub-DAG definitions, causingrCtx.DB.GetDAG()to either be called on a nil DB or to return a nil DAG — both leading to nil pointer dereference innewSubDAGExecutorand crashing the worker pod.Changes
internal/runtime/executor/dag_runner.go:rCtx.DBbefore callingGetDAGdagvalueexec.ErrDAGNotFoundand include a remediation hint pointing users to setworker_selector: localon the parent DAGinternal/runtime/executor/dag_runner_test.go:TestNewSubDAGExecutor_NilDB— asserts structured error whenrCtx.DB == nilTestNewSubDAGExecutor_NilDAGReturn— asserts structured error whenGetDAGreturns nil dagBehaviour Change
Before:
```
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation]
```
After:
```
cannot resolve sub-DAG "child": no local DAG store available
(hint: parent DAG was dispatched to a worker without local DAG cache
— consider setting worker_selector: local on the parent DAG):
DAG is not found
```
Test Plan
```
go test ./internal/runtime/executor/...
ok github.com/dagucloud/dagu/internal/runtime/executor 0.015s
```
Local verification on a multi-cluster K3s setup confirmed the worker pod no longer crashes when an orchestrator DAG is mis-dispatched; the descriptive error appears in worker logs instead.
Related
See #2257 for the full reproduction scenario and background. Example DAGs demonstrating the pattern are documented in that issue's comments; happy to send a follow-up PR adding them to
examples/embedded/distributed/if the maintainers prefer.Summary by cubic
Fixes a nil pointer panic when a parent DAG on a remote worker calls a sub-DAG missing from the worker’s local store. Now returns a clear “DAG not found” error with guidance, preventing worker crashes.
rCtx.DBand theGetDAGresult inNewSubDAGExecutor.exec.ErrDAGNotFoundwith a hint to useworker_selector: localon the parent DAG in distributed mode.Written for commit fba216b. Summary will update on new commits.
Summary by CodeRabbit
Bug Fixes
Tests