Skip to content

executor: fix nil pointer when sub-DAG not in worker local store#2258

Merged
yohamta0 merged 2 commits into
dagucloud:mainfrom
four-flames:fix/dag-runner-nil-check
Jun 4, 2026
Merged

executor: fix nil pointer when sub-DAG not in worker local store#2258
yohamta0 merged 2 commits into
dagucloud:mainfrom
four-flames:fix/dag-runner-nil-check

Conversation

@four-bytes-robby

@four-bytes-robby four-bytes-robby commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #2257.

When default_execution_mode: distributed is set and a parent DAG without worker_selector uses dag.run to invoke a sub-DAG, the parent gets dispatched to a remote worker. The worker's local DAG store does not contain the parent's sub-DAG definitions, causing rCtx.DB.GetDAG() to either be called on a nil DB or to return a nil DAG — both leading to nil pointer dereference in newSubDAGExecutor and crashing the worker pod.

Changes

internal/runtime/executor/dag_runner.go:

  • Add nil-check on rCtx.DB before calling GetDAG
  • Add nil-check on the returned dag value
  • Both new error paths wrap exec.ErrDAGNotFound and include a remediation hint pointing users to set worker_selector: local on the parent DAG

internal/runtime/executor/dag_runner_test.go:

  • New test TestNewSubDAGExecutor_NilDB — asserts structured error when rCtx.DB == nil
  • New test TestNewSubDAGExecutor_NilDAGReturn — asserts structured error when GetDAG returns nil dag

Behaviour Change

Before:
```
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation]
```

After:
```
cannot resolve sub-DAG "child": no local DAG store available
(hint: parent DAG was dispatched to a worker without local DAG cache
— consider setting worker_selector: local on the parent DAG):
DAG is not found
```

Test Plan

```
go test ./internal/runtime/executor/...
ok github.com/dagucloud/dagu/internal/runtime/executor 0.015s
```

Local verification on a multi-cluster K3s setup confirmed the worker pod no longer crashes when an orchestrator DAG is mis-dispatched; the descriptive error appears in worker logs instead.

Related

See #2257 for the full reproduction scenario and background. Example DAGs demonstrating the pattern are documented in that issue's comments; happy to send a follow-up PR adding them to examples/embedded/distributed/ if the maintainers prefer.


Summary by cubic

Fixes a nil pointer panic when a parent DAG on a remote worker calls a sub-DAG missing from the worker’s local store. Now returns a clear “DAG not found” error with guidance, preventing worker crashes.

  • Bug Fixes
    • Add nil checks on rCtx.DB and the GetDAG result in NewSubDAGExecutor.
    • Return errors wrapping exec.ErrDAGNotFound with a hint to use worker_selector: local on the parent DAG in distributed mode.
    • Add tests for nil DB and nil DAG cases, with doc comments; worker now logs a structured error instead of panicking.

Written for commit fba216b. Summary will update on new commits.

Review in cubic

Summary by CodeRabbit

  • Bug Fixes

    • Improved validation when executing sub-workflows to prevent errors from missing database configurations.
    • Enhanced error messages to provide clearer guidance on required configuration settings for sub-workflow execution.
  • Tests

    • Added comprehensive test coverage for error handling in sub-workflow executor scenarios.

When default_execution_mode is set to distributed and a parent DAG
without worker_selector uses dag.run to invoke a sub-DAG, the parent
gets dispatched to a remote worker. The worker's local DAG store does
not contain the parent's sub-DAG definitions, causing
rCtx.DB.GetDAG() to either be called on a nil DB or to return a nil
DAG — both leading to nil pointer dereference in newSubDAGExecutor.

This change:
- Adds a nil-check on rCtx.DB before GetDAG
- Adds a nil-check on the returned DAG
- Both errors wrap exec.ErrDAGNotFound and include a remediation hint
  pointing users to set worker_selector: local on the parent DAG

Fixes dagucloud#2257
@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds validation and error handling to NewSubDAGExecutor in the DAG runner to detect when sub-DAGs cannot be resolved due to missing database handles or nil DAG records, returning structured errors with guidance on configuring worker_selector: local instead of panicking.

Changes

SubDAG Executor Nil-Safety

Layer / File(s) Summary
Sub-DAG executor database and DAG validation
internal/runtime/executor/dag_runner.go, internal/runtime/executor/dag_runner_test.go
NewSubDAGExecutor now validates that rCtx.DB is not nil before attempting to fetch a sub-DAG; if nil or if GetDAG returns nil, the function returns an error with a hint about setting worker_selector: local. Two unit tests cover the nil-database and nil-DAG-return scenarios, asserting error type, nil executor return, and hint presence.

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the primary fix: adding nil pointer handling when a sub-DAG is missing from a worker's local store.
Linked Issues check ✅ Passed The PR fully addresses issue #2257 by adding nil-checks on rCtx.DB and the returned dag, wrapping errors with ErrDAGNotFound, and including remediation hints as requested.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the nil-pointer crash and adding corresponding tests; no unrelated modifications are present.
Description check ✅ Passed The pull request description comprehensively covers all required template sections with detailed information about the changes, related issue, and testing.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Re-trigger cubic

@yohamta0 yohamta0 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot!

@yohamta0 yohamta0 merged commit eb1b379 into dagucloud:main Jun 4, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

coordinator: nil pointer panic when orchestrator DAG (dag.run) dispatched to remote worker via default_execution_mode: distributed

2 participants