Skip to content

[All Platforms][Install] Host DNS blocking via iptables OUTPUT is not caught in preflight; failures only surface during provider validation #4784

@PrachiShevate-nv

Description

@PrachiShevate-nv

Environment

  • Product: NemoClaw
  • Version: v0.0.58
  • Host OS: Ubuntu 22.04 / Ubuntu 24.04
  • Container runtime: Docker
  • Inference: NVIDIA Endpoints
  • GPU: NVIDIA GB10

Summary

When host outbound DNS is blocked at the OUTPUT chain (tcp/udp dport 53), host-level DNS and Docker pull DNS fails, but nemoclaw onboard preflight still reports "Container DNS resolution works" and proceeds. The DNS failure is only detected later during NVIDIA Endpoints validation, which prints curl: (6) Could not resolve host: integrate.api.nvidia.com.

This makes preflight misleading for operators: it suggests container DNS is healthy even when host-level DNS for providers is broken.

Preconditions

  • NemoClaw installed and working on a host with Docker and NVIDIA GPU.
  • NVIDIA Endpoints selected as inference provider during nemoclaw onboard.
  • No special Docker DNS configuration beyond defaults (so Docker containers rely on host resolver / systemd-resolved).

Steps to Reproduce

  1. On the host, block outbound DNS in the OUTPUT chain:
    sudo iptables -I OUTPUT -p udp --dport 53 -j DROP
    sudo iptables -I OUTPUT -p tcp --dport 53 -j DROP
  2. Confirm host DNS is broken:
    nslookup registry.npmjs.org   # or: curl https://integrate.api.nvidia.com
    Both should fail with "communications error" / "Could not resolve host".
  3. Run NemoClaw onboarding:
    nemoclaw onboard --non-interactive --yes \
      --yes-i-accept-third-party-software \
      --name host-dns-test
  4. Observe preflight output (step [1/8]) and subsequent steps.

Expected Result

Preflight should either:

  • Detect that provider-relevant DNS / host API reachability is broken, and fail early with a clear error, for example:

    "Host DNS / provider hostname resolution failed during preflight. Could not resolve integrate.api.nvidia.com. Check host DNS, VPN, or outbound network ACLs."

OR

  • Clearly distinguish between "container DNS OK" and "host/provider DNS unreachable", so operators are not misled by a blanket "Container DNS resolution works" when provider DNS failures are guaranteed later.

nemoclaw onboard should not continue past preflight as if DNS is healthy when host-level DNS for cloud providers is known-broken.

Actual Result

nemoclaw onboard preflight prints:

[1/8] Preflight checks
...
✓ Container DNS resolution works
...

Onboard proceeds to inference configuration and NVIDIA Endpoints selection.

During provider validation, the first visible failure appears:

Chat Completions API validation timed out; retrying in 5s...
...
NVIDIA Endpoints endpoint validation failed.
Chat Completions API: curl failed (exit 6): curl: (6) Could not resolve host: integrate.api.nvidia.com
Validation could not resolve the provider hostname. Check DNS, VPN, or the endpoint URL.

From an operator perspective:

  • Preflight claims container DNS is fine.
  • The actual DNS failure is only surfaced in step [3/8] when contacting NVIDIA Endpoints, after the user has already provided credentials and picked a model.

Why this is a separate issue

The container-DNS preflight check behaves correctly when Docker's container path is blocked via DOCKER-USER rules (preflight fails with "Container DNS probe did not complete" and prints Docker/systemd-resolved remediation).

This issue is specifically about host-level DNS / provider reachability being caught only in provider validation, not in preflight, while preflight still prints "Container DNS resolution works" in scenarios where host DNS is clearly broken.

Proposed Improvement

Either:

  • Add a host DNS / provider reachability preflight that probes key provider hostnames (e.g., integrate.api.nvidia.com) and fails early with targeted remediation when they cannot be resolved, or

  • Adjust the preflight output to explicitly separate:

    • "Container DNS resolution works (for internal/registry lookups)"
    • "Provider DNS / cloud endpoints not checked until inference validation"

    so operators aren't misled into thinking the entire DNS path is validated.

Metadata

Metadata

Assignees

Labels

NV QABugs found by the NVIDIA QA Teamarea: installInstall, setup, prerequisites, or uninstall flowarea: networkingDNS, proxy, TLS, ports, host aliases, or connectivity

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions