fix(onboard): inject DNS fallback when host has systemd-resolved loopback (#3579)#3628
Conversation
…back (#3579) Adds `detectSandboxFallbackDns()` to read `/etc/resolv.conf`, detect when all nameservers are loopback (e.g. 127.0.0.53 from systemd-resolved), and return the real upstream from `/run/systemd/resolve/resolv.conf`. The recreate path now injects this as `--dns <upstream>` so sandboxes on a non-host network don't inherit an unreachable resolver. Closes #3579. Pairs with #3623 (which switched the default network mode to non-host so Docker's embedded resolver works for most users). Acceptance clauses from the issue (manager spec) all met: - Detect /etc/resolv.conf → 127.0.0.53 - Inject reachable resolver via --dns - Preserve host.openshell.internal (#3623 already did this) - Regression test naming all 4 hostnames from the spec Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughDetects when the host's ChangesDocker GPU Sandbox DNS Fallback
Sequence DiagramsequenceDiagram
participant Recreate as recreateOpenShellDockerSandboxWithGpu
participant Detect as detectSandboxFallbackDns
participant Builder as buildDockerGpuCloneRunArgs
participant Docker as docker run
Recreate->>Detect: d.detectSandboxFallbackDns()
Detect-->>Recreate: returns fallback or null
Recreate->>Builder: pass cloneOptions (sandboxFallbackDns set if present)
Builder->>Builder: forward host.Dns/host.DnsSearch
Builder->>Builder: if host.Dns empty && sandboxFallbackDns -> add --dns <fallback>
Builder->>Docker: run args (may include --dns)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
Selective E2E Results —
|
| Job | Result |
|---|---|
| gpu-e2e | ⏭️ skipped |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Auto-dispatched E2E: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
…-through (#3579) Hooks `detectSandboxFallbackDns` into `DockerGpuPatchDeps` so the production callsite in `recreateOpenShellDockerSandboxWithGpu` can be stubbed in unit tests. Adds two integration tests: - Stubs the hook to return "9.9.9.9" and asserts `--dns 9.9.9.9` lands in the final dockerRunDetached call args (wire-through verified). - Stubs the hook to return null and asserts no --dns is injected. Addresses the audit gap on PR #3628 where `detectSandboxFallbackDns()` was unit-tested in isolation and the injection path in `buildDockerGpuCloneRunArgs` was unit-tested in isolation, but the wire-through from production callsite into clone args was not. Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
Selective E2E Results —
|
| Job | Result |
|---|---|
| gpu-e2e | ⏭️ skipped |
Summary
nemoclaw onboard --gpuon Linux hosts running systemd-resolved (Ubuntu 22+/24, DGX Spark FastOS) creates GPU sandboxes that inheritnameserver 127.0.0.53in/etc/resolv.confwhen running on--network=host. #3623 already sidestepped the default-path symptom of #3579 by switching the GPU patch's default network mode away from host; Docker's embedded resolver at 127.0.0.11 then forwards correctly via the daemon (which runs in the host namespace and can reach 127.0.0.53).This PR closes #3579 against the manager-provided spec, which calls for explicit detection of the
127.0.0.53trap and explicit--dnsinjection rather than relying on Docker's implicit embedded-DNS forwarding. Belt-and-suspenders defense-in-depth:detectSandboxFallbackDns()reads/etc/resolv.conf, detects loopback-only state, pulls the real upstream from/run/systemd/resolve/resolv.conf, and the GPU-recreate path injects it as--dns <upstream>.Acceptance criteria mapping (issue #3579 + manager's spec)
--dnsinjection here--dns <real upstream>forwarded to container127.0.0.53/ systemd-resolveddetectSandboxFallbackDnsreads/etc/resolv.conf, falls through to/run/systemd/resolve/resolv.conffor the real upstream--dnsbuildDockerGpuCloneRunArgsDNS blockhost.openshell.internalresolvable--add-hostpreservation (asserted by regression test)regression manifest: …names all fourTest plan
24/24 pass (16 existing + 8 new for #3579).
Notes for reviewers
/etc/resolv.confnameservers are 127.0.0.x. Single non-loopback resolver → null. Empty file → null. Missing systemd-resolved upstream file → null.--dnsconfig: ifhost.Dnsis non-empty, we don't override.--dnsis skipped entirely on--network=hostbecause Docker ignores--dnsflags in host networking mode. The opt-in host-networking case (viaNEMOCLAW_DOCKER_GPU_PATCH_NETWORK=host) is therefore not mitigated by this PR — the detection helper is now exported and available, but auto-injection would be a no-op. If a user explicitly opts into host networking on a systemd-resolved host, they still hit the original [DGX Spark][Sandbox] Sandbox DNS completely broken under Docker-driver gateway (OpenShell 0.0.39) — all domain resolution fails with EAI_AGAIN #3579 trap. The default path is bulletproof.--add-hostforhost.openshell.internal, non-host network mode for public hosts,--dnsinjection when fallback is set). True E2E that actually runsgetent hostsfor the three public hostnames would need real Docker + outbound network and belongs intest/e2e/if QA wants it later.Closes #3579
Summary by CodeRabbit