Skip to content

fix(jetson): apply br_netfilter on JetPack R39 when missing (Fixes #2418)#2419

Merged
ericksoa merged 8 commits into
mainfrom
fix/setup-jetson-r39-br-netfilter-2418
Apr 24, 2026
Merged

fix(jetson): apply br_netfilter on JetPack R39 when missing (Fixes #2418)#2419
ericksoa merged 8 commits into
mainfrom
fix/setup-jetson-r39-br-netfilter-2418

Conversation

@yanyunl1991

@yanyunl1991 yanyunl1991 commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #2418.

JetPack R39 was silently skipped by setup-jetson.shget_jetpack_version() returned empty for any release ≥ 39, so main() exited before ever setting up br_netfilter. Without net.bridge.bridge-nf-call-iptables=1, kube-proxy's iptables NAT rules are present but don't match pod-originated traffic, so sandbox
agent pods can't resolve openshell.openshell.svc.cluster.local and crash-loop during onboard's [7/8] Setting up OpenClaw inside sandbox step.

See #2418 for full diagnostic chain: pod-IP DNS query works, service-IP times out, iptables KUBE-SVC-* rules exist, lsmod | grep br_netfilter is empty, and loading the module immediately fixes it.

Changes

Scoped to scripts/setup-jetson.sh only. Approach keeps R36/R38 code paths untouched and co-locates R39 handling inside the existing if ((release >= 39)) branch rather than threading a new jp7-r39 token through configure_jetson_host().

  1. New helper bridge_netfilter_ready() — returns 0 when /proc/sys/net/bridge/bridge-nf-call-iptables exists and reads 1. That file only appears after the module is loaded, so a single check covers both conditions.

  2. New helper apply_br_netfilter_setup() — extracts the four-line modprobe + sysctl + tee /etc/modules-load.d/ + tee /etc/sysctl.d/ block that previously lived inline in configure_jetson_host(). Both the R39 branch and the R36/R38 path now call this helper, so behavior for R36/R38 is byte-identical to
    main.

  3. R39 branch in get_jetpack_version() — the existing if ((release >= 39)) early-return now carries the work instead of just logging and returning:

    • If bridge_netfilter_ready → log and return empty (main exits, no host state is modified).
    • Otherwise → check sudo, call apply_br_netfilter_setup, return empty.

    Returning empty preserves the existing invariant that R39 does not flow through configure_jetson_host() (which handles iptables-legacy / daemon.json / post-setup docker restart — none of which R39 needs).

Conditional on R39 only because not every R39 image hits the bug: some ship with br_netfilter already loaded and we don't want to plant /etc/modules-load.d/nemoclaw.conf + /etc/sysctl.d/99-nemoclaw.conf drop-ins on those systems. For JP6 / JP7-R38 the behavior is preserved exactly (unconditional
apply_br_netfilter_setup via configure_jetson_host).

What the user sees on affected R39 hosts

Before (bug):

Jetson detected (L4T 39.1) — this version does not require any host setup

(…then onboard[7/8] fails with sandbox is not ready.)

After, first-affected R39 install:

Jetson detected (L4T 39.1) — loading br_netfilter (required by k3s inside the OpenShell gateway; see #2418)

After, R39 image that already has it:

Jetson detected (L4T 39.1) — br_netfilter already configured; no host setup needed

Verification

  • bash -n scripts/setup-jetson.sh — clean (syntax check).
  • npm run typecheck:cli — clean.
  • npm run ts-migration:guard -- --base origin/main --head HEADNo edits to migrated legacy paths or removed compatibility shims detected.
  • Existing test/setup-jetson.test.ts (4 tests for the Python daemon.json patcher, which is unchanged) — all pass.
  • Manual end-to-end on a JetPack R39 Jetson Orin (the setup-jetson.sh skips br_netfilter on JetPack R39, breaking k3s service DNS and onboard [7/8] #2418 reporter's affected machine):
    • Before: nemoclaw onboard reaches [7/8] and fails with sandbox is not ready; kubectl describe pod my-assistant -n openshell shows the agent container in CrashLoopBackOff with 156+ restarts, logs show Temporary failure in name resolution for openshell.openshell.svc.cluster.local.
    • After applying the equivalent of this fix manually (sudo modprobe br_netfilter && sudo sysctl -w net.bridge.bridge-nf-call-iptables=1 net.bridge.bridge-nf-call-ip6tables=1) and rerunning nemoclaw onboard[1/8][8/8] complete, sandbox reaches Ready, dashboard reachable.

No bash-level unit tests added because the existing test coverage for this script only exercises the embedded Python daemon.json patcher (extracted via regex), and the new logic is top-level bash that would require either a source-guard refactor or mocking /etc/nv_tegra_release / /proc/sys/... via env-var
indirection — neither matches existing patterns. Happy to add them if a reviewer prefers.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

AI Disclosure

  • AI-assisted — tool: Claude Code

Summary by CodeRabbit

  • Chores
    • Centralized bridge/netfilter setup with a readiness check so host changes are applied only when needed.
    • Newer releases get a conditional dual‑stack (IPv4+IPv6) setup; older paths use iptables‑only configuration.
    • Ensures elevated privileges when required, persists bridge/IPv4/IPv6 settings across reboots, and logs runtime values and persistence locations.

Signed-off-by: Yanyun Liao yanyunl@nvidia.com

yanyunl1991 and others added 2 commits April 24, 2026 14:04
)

JetPack R39 is silently skipped by setup-jetson.sh (get_jetpack_version
early-returns for any release ≥ 39). That means `modprobe br_netfilter`
and `sysctl -w net.bridge.bridge-nf-call-iptables=1` — which live inside
`configure_jetson_host()` — are never run on R39 hosts.

The assumption behind that early-return was wrong: k3s running inside
the OpenShell gateway container needs bridge-nf-call-iptables=1 for
ClusterIP service routing. Without it, the kube-proxy iptables NAT
rules are written correctly but don't match pod-originated traffic, so
sandbox agent pods can't resolve `openshell.openshell.svc.cluster.local`
and crash-loop during onboard's [7/8] step.

Recognize R39 as `jp7-r39`, same as R38 for the iptables / daemon.json
case (no changes needed there). For the shared br_netfilter block: keep
the existing unconditional behavior for JP6 / JP7-R38 (idempotent, no
reason to change what ships today), but on R39 apply only when
`/proc/sys/net/bridge/bridge-nf-call-iptables` is absent or not set to
1. This avoids writing NemoClaw-owned drop-ins to `/etc/modules-load.d/`
and `/etc/sysctl.d/` on R39 images that already ship with the module in
place (not all R39 builds need it — reporter's hit the broken case).

Verified manually on an R39 Jetson Orin: before the fix, sandbox agent
pod crash-looped 156+ times with
`failed to lookup address information: Temporary failure in name
resolution`. After `modprobe br_netfilter; sysctl -w
net.bridge.bridge-nf-call-iptables=1`, a fresh `nemoclaw onboard` runs
through [1/8]–[8/8] cleanly. Full diagnostic chain in #2418.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Yanyun Liao <yanyunl@nvidia.com>
…2418)

Address reviewer feedback: keep the existing `if ((release >= 39))`
branch in `get_jetpack_version()` and do the R39 check + setup right
there instead of threading a new "jp7-r39" token through
`configure_jetson_host()`.

The earlier version added a `39.*)` case to `get_jetpack_version`, a
matching `jp7-r39)` case to `configure_jetson_host`, and a conditional
inside that function's br_netfilter block. This version:

- Leaves `get_jetpack_version`'s R36/R38 case statement and
  `configure_jetson_host` case statements completely untouched for
  R36/R38 — no regression surface there.
- Extracts `apply_br_netfilter_setup()` as a shared helper so the R39
  branch and the R36/R38 path call the same code without duplication.
- In the `>=39` branch: probe `bridge_netfilter_ready` first; if the
  host is already set up, log and return empty (main() exits early,
  same as before this PR). Otherwise do the sudo prompt and call
  `apply_br_netfilter_setup`.

Net effect on R39 is identical — `br_netfilter` is loaded and
persisted only when missing — but the diff is smaller and R39-specific
logic is co-located with the rest of the R39 detection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Yanyun Liao <yanyunl@nvidia.com>
@coderabbitai

coderabbitai Bot commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Centralizes br_netfilter readiness check and apply logic in the Jetson setup script; conditionally applies iptables-only (R36/R38) or dual-stack (R39) br_netfilter setup (modprobe, runtime sysctls, and persistence files), and avoids early-return on JP7 R39 so host bridge filtering is enforced when needed.

Changes

Cohort / File(s) Summary
Jetson setup script
scripts/setup-jetson.sh
Adds bridge_netfilter_ready() readiness check and two apply functions (iptables-only for R36/R38 and dual-stack for R39). Moves br_netfilter/module/sysctl persistence into helpers, enforces sudo when required, logs runtime sysctl values and persistence paths, and removes inline br_netfilter setup from older configure_jetson_host() branches so R39 no longer early-returns. Focus review on readiness logic, conditional branching for release versions, sysctl/module persistence paths (/etc/sysctl.d/99-nemoclaw.conf, /etc/modules-load.d/nemoclaw.conf), and sudo enforcement/logging.

Sequence Diagram(s)

sequenceDiagram
    participant Installer
    participant HostKernel
    participant SysctlPersist
    participant ModulesPersist

    Installer->>HostKernel: bridge_netfilter_ready? (check runtime & persistence)
    alt not ready
        Installer->>HostKernel: modprobe br_netfilter
        Installer->>HostKernel: set sysctl net.bridge.bridge-nf-call-iptables/ip6tables=1
        Installer->>SysctlPersist: write /etc/sysctl.d/99-nemoclaw.conf
        Installer->>ModulesPersist: write /etc/modules-load.d/nemoclaw.conf
        Installer->>HostKernel: log runtime values and persistence paths
    else ready
        Installer->>Installer: report no host setup required
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I nudged the bridge, gave nets a nudge,
Loaded a module, then wrote a judge,
Sysctls tucked in a file with care,
Boots will remember each little prayer,
R39 hops happy, DNS in the air.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the primary change: fixing br_netfilter application on JetPack R39 when missing, with direct reference to the issue it addresses.
Linked Issues check ✅ Passed The PR implements all coding requirements from #2418: br_netfilter and bridge sysctls are now applied on R39 hosts with proper readiness checks, persistence handling, and conditional sudo enforcement.
Out of Scope Changes check ✅ Passed All changes in scripts/setup-jetson.sh are directly scoped to fixing the br_netfilter issue on R39; no unrelated modifications to iptables-legacy or daemon.json are included, maintaining byte-identical behavior for R36/R38.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/setup-jetson-r39-br-netfilter-2418

Comment @coderabbitai help to get the list of available commands and usage tips.

The original fix only set the IPv4 sysctl (bridge-nf-call-iptables),
which is enough for IPv4 ClusterIP service routing in k3s — the path
validated end-to-end on the reporter's Jetson Orin R39.

However, the manual workaround tracked in the issue thread enables
both bridge-nf-call-iptables and bridge-nf-call-ip6tables, and that
is also the combination that was actually verified on the reporter's
host. Shipping only v4 leaves a small but real gap for any future
dual-stack or v6-only cluster service the gateway might route, and
makes the automated fix diverge slightly from the proven-works
manual steps.

- apply_br_netfilter_setup: flip ip6tables sysctl alongside iptables,
  and persist both in /etc/sysctl.d/99-nemoclaw.conf so the setting
  survives reboot (same drop-in, two lines).
- bridge_netfilter_ready: require both sysctls to be 1 (and both
  /proc nodes to exist) before treating the host as already-configured,
  so we don't skip the apply step when only v4 is pre-set.
- Updated comments to reflect v4+v6 coverage.

Docker/k3s do not need to be restarted after this — br_netfilter is
a kernel module and the sysctl takes effect immediately; new bridged
traffic picks up the new netfilter hook without any daemon reload.
The R39 apply path currently prints a single "loading br_netfilter"
line before running modprobe/sysctl/tee, all of which are silent
(their stdout is redirected to /dev/null). On success the script
exits quietly, leaving the operator without a confirmation that the
apply actually completed, which two sysctls were flipped, or where
the drop-ins were persisted. On failure, only the raw error from
whichever step tripped `set -e` is visible.

Add one structured info line at the end of the apply branch
summarising what was done and where state now lives. Keeps the
skip branch unchanged and does not affect R36/R38 logging (which
already gets output from update-alternatives / systemctl restart).

Useful for:
- confirming the script actually ran the expected code path (in
  particular, that it's the v4+v6 version, not an older v4-only
  build) when triaging future reports.
- documenting the on-host footprint for later cleanup / debugging.
Refines the confirmation log added in 5b1753f so the script itself
produces evidence of the fix's effect, rather than just asserting
"we set it to 1". A user validating on their own Jetson can now
confirm from script output alone that:

  1) The runtime sysctls read back as 1 (read from /proc, not
     hardcoded in the log string).
  2) What this unblocks — sandbox-pod → ClusterIP routing, i.e. the
     original #2418 symptom.
  3) No docker/k3s restart is required (common operator question;
     answered inline so reporters don't have to ask).
  4) Which drop-in files own the reboot persistence.

Before: "br_netfilter applied: bridge-nf-call-{iptables,ip6tables}=1; persisted ..."
  (one line, hardcoded string — no actual verification)

After: "br_netfilter runtime: bridge-nf-call-iptables=1, bridge-nf-call-ip6tables=1 — sandbox → ClusterIP routing (CoreDNS, services) is unblocked; no docker or k3s restart needed"
       "Reboot persistence: /etc/modules-load.d/nemoclaw.conf, /etc/sysctl.d/99-nemoclaw.conf"
  (two lines; first reads live /proc state as proof)

Falls back to "?" in the unlikely case /proc/sys/net/bridge/* is
missing when read (e.g., racy unload), so the log line is always
printable even in edge cases.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/setup-jetson.sh`:
- Line 183: The call to apply_br_netfilter_setup now routes jp6/jp7-r38 through
a shared helper that introduced expanded persistence writes intended for R39;
isolate legacy JetPack behavior by either creating a separate helper (e.g.,
apply_br_netfilter_setup_legacy) and keep jp6/jp7-r38 calling the legacy
variant, or add a guarded conditional inside apply_br_netfilter_setup that
checks the JetPack version and only performs the extra persistence writes when
the version equals R39; update references to apply_br_netfilter_setup in the jp6
and jp7-r38 flow to use the legacy helper or ensure the conditional prevents
R36/R38 writes so those paths remain byte-identical.
- Around line 19-28: The current bridge_netfilter_ready() only checks live /proc
state; update the logic so the script does not skip persistence steps
prematurely: either extend bridge_netfilter_ready() to also verify the
persistence drop-ins exist and contain the expected settings (check
/etc/modules-load.d/* for br_netfilter and /etc/sysctl.d/* for
bridge-nf-call-iptables and bridge-nf-call-ip6tables), or make the persistence
step idempotent by always creating/updating the /etc/modules-load.d and
/etc/sysctl.d drop-in files when they are missing or don’t contain the required
entries; reference the bridge_netfilter_ready() function and the persistence
targets (/etc/modules-load.d and /etc/sysctl.d) when implementing the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9260e3aa-267d-460a-a3fb-b1953242f97e

📥 Commits

Reviewing files that changed from the base of the PR and between 5b1753f and 9aa281e.

📒 Files selected for processing (1)
  • scripts/setup-jetson.sh

Comment thread scripts/setup-jetson.sh Outdated
Comment thread scripts/setup-jetson.sh
…2418)

Addresses two review comments on PR #2419:

1. bridge_netfilter_ready only inspected the live kernel state,
   which let the skip branch trigger on runtime-only configuration
   (e.g. someone ran modprobe + sysctl manually for a test). The
   drop-ins would never get written and the fix would vanish at
   the next reboot. The predicate now also requires:
     - /etc/modules-load.d/nemoclaw.conf contains `br_netfilter`
     - /etc/sysctl.d/99-nemoclaw.conf contains both sysctl=1 lines
   apply_br_netfilter_setup_r39 is already idempotent, so when
   runtime is live but drop-ins are missing or stale (v4-only
   leftover from an older script), ready returns false and the
   apply path rewrites them.

2. The earlier "also flip ip6tables" commit (77e84f9) added v6
   handling to the shared apply_br_netfilter_setup helper, which
   was also called by R36/R38 via configure_jetson_host. That
   expanded the R36/R38 behavior — out of scope for this PR.
   Split into two helpers:
     - apply_br_netfilter_setup: v4-only, byte-identical to the
       pre-#2418 inline block, still used by R36/R38.
     - apply_br_netfilter_setup_r39: v4+v6, the dual-stack variant
       validated on the reporter's Jetson; only called from the
       R39 branch in get_jetpack_version.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/setup-jetson.sh`:
- Around line 30-39: The bridge IPv6 sysctl is being written unconditionally
which fails on kernels without bridge-nf-call-ip6tables; update the script so
the write of net.bridge.bridge-nf-call-ip6tables=1 is only attempted when the
kernel exposes /proc/sys/net/bridge/bridge-nf-call-ip6tables (or when the file
exists), and update the bridge_netfilter_ready() function to check for the IPv6
sysctl file before asserting its value (mirror the existing IPv4 checks in
bridge_netfilter_ready()); ensure any set -e failure paths are avoided by gating
the IPv6 sysctl write with a presence check and only adding the grep check for
net.bridge.bridge-nf-call-ip6tables=1 in /etc/sysctl.d/99-nemoclaw.conf when
that sysctl is present.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: adc4de17-ceae-46df-8123-2bafdfcc99e4

📥 Commits

Reviewing files that changed from the base of the PR and between 9aa281e and 83a2398.

📒 Files selected for processing (1)
  • scripts/setup-jetson.sh

Comment thread scripts/setup-jetson.sh Outdated
CodeRabbit flagged that the IPv6 sysctl write and the ready-check
for bridge-nf-call-ip6tables were unconditional, which would fail
under `set -e` on kernels built without CONFIG_IPV6 (where
/proc/sys/net/bridge/bridge-nf-call-ip6tables does not exist).

Make the IPv6 leg present-aware in all three places:

- apply_br_netfilter_setup_r39: only runs `sysctl -w ...-ip6tables=1`
  when the /proc node exists; the persistence drop-in then contains
  only the v4 line on IPv6-less kernels (so bridge_netfilter_ready
  stays consistent with what was actually written).

- bridge_netfilter_ready: v4 checks are mandatory; v6 checks (both
  the runtime sysctl value and the drop-in line) are gated on the
  /proc node existing. On a v4-only kernel the predicate returns
  true with just v4, matching the drop-in content.

- R39 branch log line: when /proc for ip6tables is missing, emit
  "(ip6tables not exposed by this kernel; IPv4-only persistence)"
  so the operator can tell from logs alone that the v4-only path
  was taken intentionally, rather than because v6 silently failed.
Removes the IPv6 work that scope-crept into this PR and triggered
two rounds of review: the R36/R38 byte-identical split (77e84f9 /
83a2398) and the CONFIG_IPV6-gated presence check (2a3ddbd).

The reporter's actual symptom on #2418 — sandbox pod → ClusterIP
DNS timing out — is IPv4-only (CoreDNS ClusterIP 10.43.0.10). The
v4 sysctl + br_netfilter modprobe alone is sufficient and matches
what the R36/R38 paths have always done. IPv6 was belt-and-
suspenders; every review iteration showed the extra guard code was
more complexity than the robustness was worth.

Net effect:
- apply_br_netfilter_setup_r39 is deleted; R39 calls the shared
  apply_br_netfilter_setup (same helper R36/R38 already use).
- bridge_netfilter_ready drops IPv6 runtime and persistence checks.
- R39 log line drops the IPv6 summary branching.
- CodeRabbit's other finding (persistence must be part of ready)
  is preserved: bridge_netfilter_ready still checks
  /etc/modules-load.d/nemoclaw.conf and /etc/sysctl.d/99-nemoclaw.conf
  content, so transient modprobe-only state no longer triggers skip.

If a future issue shows IPv6 cluster-service routing breaking on
R39, it can be added back with its own scope and validation.

@ericksoa ericksoa left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-scoped fix. The root cause (br_netfilter missing on R39, blocking kube-proxy NAT for pod → ClusterIP traffic) is well-diagnosed, and the fix is minimal:

  • bridge_netfilter_ready() checks both runtime state and persistence drop-ins — correctly avoids skipping when someone ran modprobe transiently without persisting.
  • apply_br_netfilter_setup() is a pure extraction of the existing 4-command sequence — R36/R38 behavior is byte-identical.
  • R39 branch applies conditionally and reads back the /proc value as log evidence.
  • All CodeRabbit feedback addressed (persistence check, legacy path isolation, IPv6 guard removed).

LGTM.

@ericksoa ericksoa merged commit 3ff9f76 into main Apr 24, 2026
16 checks passed
@cv cv added the v0.0.25 label Apr 24, 2026
ericksoa pushed a commit that referenced this pull request Apr 25, 2026
## Summary
Catches up the user-facing docs for changes merged since v0.0.24 and
bumps `docs/versions1.json` so the next docs build publishes as 0.0.25.

## Changes
- `docs/reference/troubleshooting.md`: corrected the JetPack 7 (L4T
39.x) entry to reflect the new conditional `br_netfilter` auto-load on
R39 hosts that lack it (from #2419), and added a new "DNS resolution
from inside docker fails (corporate firewall)" section covering the new
container-DNS preflight probe and platform-specific remediation paths
(from #2349).
- `docs/get-started/quickstart.md`: added a "Review the Configuration
Before the Sandbox Build" subsection documenting the new `[Y/n]` review
gate that fires after the sandbox-name prompt (from #2221).
- `docs/versions1.json`: promoted 0.0.25 to `preferred: true`, demoted
0.0.24, and trimmed to ten entries to match what
`scripts/bump-version.ts` writes. `docs/project.json` regenerates from
`versions1.json` at build time and now reports `0.0.25`.
- `.agents/skills/nemoclaw-user-get-started/` and
`.agents/skills/nemoclaw-user-reference/`: regenerated via
`scripts/docs-to-skills.py`.

## Type of Change
- [ ] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [x] Doc only (prose changes, no code sample modifications)
- [ ] Doc only (includes code sample changes)

## Verification
- [x] `npx prek run --all-files` passes
- [x] `npm test` passes
- [x] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [x] Docs updated for user-facing behavior changes
- [x] `make docs` builds without warnings (doc changes only)
- [x] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

## AI Disclosure
- [x] AI-assisted — tool: Claude Code

---
Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Onboarding now shows a configuration review and requires explicit
confirmation before building a sandbox; non-interactive mode prints the
summary without prompting.

* **Documentation**
* Expanded Jetson troubleshooting for JetPack 7 / L4T 39.x and
conditional br_netfilter guidance.
* Added corporate-firewall DNS troubleshooting with platform-specific
remediation and a verification nslookup step.
  * Quickstart updated to document the new confirmation flow.

* **Chores**
  * Documentation version index updated.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DemianHeyGen pushed a commit to DemianHeyGen/NemoClaw that referenced this pull request Apr 30, 2026
…IDIA#2418) (NVIDIA#2419)

## Summary
   
Fixes NVIDIA#2418.
                                                         
JetPack R39 was silently skipped by `setup-jetson.sh` —
`get_jetpack_version()` returned empty for any release ≥ 39, so `main()`
exited before ever setting up `br_netfilter`. Without
`net.bridge.bridge-nf-call-iptables=1`, kube-proxy's iptables NAT rules
are present but don't match pod-originated traffic, so sandbox
agent pods can't resolve `openshell.openshell.svc.cluster.local` and
crash-loop during onboard's `[7/8] Setting up OpenClaw inside sandbox`
step.
See NVIDIA#2418 for full diagnostic chain: pod-IP DNS query works, service-IP
times out, iptables `KUBE-SVC-*` rules exist, `lsmod | grep
br_netfilter` is empty, and loading the module immediately fixes it.
   
## Changes
                                                         
Scoped to `scripts/setup-jetson.sh` only. Approach keeps R36/R38 code
paths untouched and **co-locates R39 handling inside the existing `if
((release >= 39))` branch** rather than threading a new `jp7-r39` token
through `configure_jetson_host()`.
   
1. **New helper `bridge_netfilter_ready()`** — returns 0 when
`/proc/sys/net/bridge/bridge-nf-call-iptables` exists and reads `1`.
That file only appears after the module is loaded, so a single check
covers both conditions.
                                                         
2. **New helper `apply_br_netfilter_setup()`** — extracts the four-line
`modprobe` + `sysctl` + `tee /etc/modules-load.d/` + `tee
/etc/sysctl.d/` block that previously lived inline in
`configure_jetson_host()`. Both the R39 branch and the R36/R38 path now
call this helper, so behavior for R36/R38 is byte-identical to
  main.                                                  
3. **R39 branch in `get_jetpack_version()`** — the existing `if
((release >= 39))` early-return now carries the work instead of just
logging and returning:
- If `bridge_netfilter_ready` → log and return empty (main exits, no
host state is modified).
- Otherwise → check sudo, call `apply_br_netfilter_setup`, return empty.
Returning empty preserves the existing invariant that R39 does not flow
through `configure_jetson_host()` (which handles iptables-legacy /
`daemon.json` / post-setup docker restart — none of which R39 needs).
Conditional on R39 only because not every R39 image hits the bug: some
ship with `br_netfilter` already loaded and we don't want to plant
`/etc/modules-load.d/nemoclaw.conf` + `/etc/sysctl.d/99-nemoclaw.conf`
drop-ins on those systems. For JP6 / JP7-R38 the behavior is preserved
exactly (unconditional
`apply_br_netfilter_setup` via `configure_jetson_host`).
                                                         
### What the user sees on affected R39 hosts
  Before (bug):
```
Jetson detected (L4T 39.1) — this version does not require any host
setup
```
  (…then onboard[7/8] fails with `sandbox is not ready`.)
  After, first-affected R39 install:                     
```
Jetson detected (L4T 39.1) — loading br_netfilter (required by k3s
inside the OpenShell gateway; see NVIDIA#2418)
```
   
After, R39 image that already has it:
```
Jetson detected (L4T 39.1) — br_netfilter already configured; no host
setup needed
```
                                                         
## Verification
                                                         
  - `bash -n scripts/setup-jetson.sh` — clean (syntax check).
- `npm run typecheck:cli` — clean.
- `npm run ts-migration:guard -- --base origin/main --head HEAD` — `No
edits to migrated legacy paths or removed compatibility shims detected.`
- Existing `test/setup-jetson.test.ts` (4 tests for the Python
`daemon.json` patcher, which is unchanged) — all pass.
- **Manual end-to-end on a JetPack R39 Jetson Orin** (the NVIDIA#2418
reporter's affected machine):
- **Before**: `nemoclaw onboard` reaches `[7/8]` and fails with `sandbox
is not ready`; `kubectl describe pod my-assistant -n openshell` shows
the `agent` container in `CrashLoopBackOff` with 156+ restarts, logs
show `Temporary failure in name resolution` for
`openshell.openshell.svc.cluster.local`.
- **After applying the equivalent of this fix manually** (`sudo modprobe
br_netfilter && sudo sysctl -w net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1`) and rerunning `nemoclaw onboard`
→ `[1/8]`–`[8/8]` complete, sandbox reaches Ready, dashboard reachable.
No bash-level unit tests added because the existing test coverage for
this script only exercises the embedded Python `daemon.json` patcher
(extracted via regex), and the new logic is top-level bash that would
require either a source-guard refactor or mocking
`/etc/nv_tegra_release` / `/proc/sys/...` via env-var
indirection — neither matches existing patterns. Happy to add them if a
reviewer prefers.
                                                         
## Type of Change
                                                         
  - [x] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [ ] Doc only (includes code sample changes)
## AI Disclosure
  - [x] AI-assisted — tool: Claude Code                  

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Centralized bridge/netfilter setup with a readiness check so host
changes are applied only when needed.
* Newer releases get a conditional dual‑stack (IPv4+IPv6) setup; older
paths use iptables‑only configuration.
* Ensures elevated privileges when required, persists bridge/IPv4/IPv6
settings across reboots, and logs runtime values and persistence
locations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- DCO sign-off required by CI. Run: git config user.name && git
config user.email -->
Signed-off-by: Yanyun Liao <yanyunl@nvidia.com>

---------

Signed-off-by: Yanyun Liao <yanyunl@nvidia.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DemianHeyGen pushed a commit to DemianHeyGen/NemoClaw that referenced this pull request Apr 30, 2026
## Summary
Catches up the user-facing docs for changes merged since v0.0.24 and
bumps `docs/versions1.json` so the next docs build publishes as 0.0.25.

## Changes
- `docs/reference/troubleshooting.md`: corrected the JetPack 7 (L4T
39.x) entry to reflect the new conditional `br_netfilter` auto-load on
R39 hosts that lack it (from NVIDIA#2419), and added a new "DNS resolution
from inside docker fails (corporate firewall)" section covering the new
container-DNS preflight probe and platform-specific remediation paths
(from NVIDIA#2349).
- `docs/get-started/quickstart.md`: added a "Review the Configuration
Before the Sandbox Build" subsection documenting the new `[Y/n]` review
gate that fires after the sandbox-name prompt (from NVIDIA#2221).
- `docs/versions1.json`: promoted 0.0.25 to `preferred: true`, demoted
0.0.24, and trimmed to ten entries to match what
`scripts/bump-version.ts` writes. `docs/project.json` regenerates from
`versions1.json` at build time and now reports `0.0.25`.
- `.agents/skills/nemoclaw-user-get-started/` and
`.agents/skills/nemoclaw-user-reference/`: regenerated via
`scripts/docs-to-skills.py`.

## Type of Change
- [ ] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [x] Doc only (prose changes, no code sample modifications)
- [ ] Doc only (includes code sample changes)

## Verification
- [x] `npx prek run --all-files` passes
- [x] `npm test` passes
- [x] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [x] Docs updated for user-facing behavior changes
- [x] `make docs` builds without warnings (doc changes only)
- [x] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

## AI Disclosure
- [x] AI-assisted — tool: Claude Code

---
Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Onboarding now shows a configuration review and requires explicit
confirmation before building a sandbox; non-interactive mode prints the
summary without prompting.

* **Documentation**
* Expanded Jetson troubleshooting for JetPack 7 / L4T 39.x and
conditional br_netfilter guidance.
* Added corporate-firewall DNS troubleshooting with platform-specific
remediation and a verification nslookup step.
  * Quickstart updated to document the new confirmation flow.

* **Chores**
  * Documentation version index updated.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
realkim93 added a commit to realkim93/NemoClaw that referenced this pull request May 4, 2026
Bring feat/jetson-orin-nano-support up to date with origin/main after
217 commits of upstream work.

Conflict resolutions:
- scripts/setup-jetson.sh: adopt main's L4T-version-aware structure
  (NVIDIA#1910, NVIDIA#1913, NVIDIA#2419) — JP6 / JP7-R38 / JP7-R39 case dispatch with
  idempotent br_netfilter persistence. Drops the PR's preflight wrapper
  in favor of the simpler main rewrite that already covers Orin Nano
  via the JP6 path.
- src/lib/onboard.ts: keep main's verifyWebSearchInsideSandbox; preserve
  the Jetson GPU-detection branch on top of main's refined NVIDIA
  detection format; restore the gpu parameter on startGatewayWithOptions
  so the patchGatewayImageForJetson() call still type-checks.
- src/lib/onboard-providers.ts: re-add the ollama-local / vllm-local
  branches in getSandboxInferenceConfig that override inferenceBaseUrl
  via getLocalProviderBaseUrl.
- docs/get-started/quickstart.md and the autogenerated skills commands.md:
  take main (Jetson now lives in ci/platform-matrix.json).

Port setup-jetson into main's oclif command-registry architecture:
- Add SetupJetsonCliCommand thin oclif adapter and runSetupJetsonAction
  that shells out to scripts/setup-jetson.sh under sudo on Linux.
- Register setup-jetson in command-registry.ts (Getting Started),
  legacy-oclif-dispatch.ts, and oclif-commands.ts.

Switch patchGatewayImageForJetson to docker.dockerInspectFormat /
docker.dockerRun so it stays inside main's docker-abstraction guard,
and update the existing jetson tests to match the argv-based runner
contract.

Fix the Jetson note in ci/platform-matrix.json — the relative
../reference/commands.md#nemoclaw-setup-jetson link broke the
markdown-links CI check when the platform-matrix generator copied it
into root README.md. Replaced with the docs.nvidia.com absolute URL.

Allowlist two pre-existing slack-app-token false positives surfaced by
the local pre-commit gitleaks pass.

Verified: npm run build:cli, npm run lint, full vitest suite (3160
pass, 13 skip), and check-docs.sh markdown-link check all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ksapru pushed a commit to ksapru/NemoClaw that referenced this pull request May 12, 2026
## Summary
Catches up the user-facing docs for changes merged since v0.0.24 and
bumps `docs/versions1.json` so the next docs build publishes as 0.0.25.

## Changes
- `docs/reference/troubleshooting.md`: corrected the JetPack 7 (L4T
39.x) entry to reflect the new conditional `br_netfilter` auto-load on
R39 hosts that lack it (from NVIDIA#2419), and added a new "DNS resolution
from inside docker fails (corporate firewall)" section covering the new
container-DNS preflight probe and platform-specific remediation paths
(from NVIDIA#2349).
- `docs/get-started/quickstart.md`: added a "Review the Configuration
Before the Sandbox Build" subsection documenting the new `[Y/n]` review
gate that fires after the sandbox-name prompt (from NVIDIA#2221).
- `docs/versions1.json`: promoted 0.0.25 to `preferred: true`, demoted
0.0.24, and trimmed to ten entries to match what
`scripts/bump-version.ts` writes. `docs/project.json` regenerates from
`versions1.json` at build time and now reports `0.0.25`.
- `.agents/skills/nemoclaw-user-get-started/` and
`.agents/skills/nemoclaw-user-reference/`: regenerated via
`scripts/docs-to-skills.py`.

## Type of Change
- [ ] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [x] Doc only (prose changes, no code sample modifications)
- [ ] Doc only (includes code sample changes)

## Verification
- [x] `npx prek run --all-files` passes
- [x] `npm test` passes
- [x] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [x] Docs updated for user-facing behavior changes
- [x] `make docs` builds without warnings (doc changes only)
- [x] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

## AI Disclosure
- [x] AI-assisted — tool: Claude Code

---
Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Onboarding now shows a configuration review and requires explicit
confirmation before building a sandbox; non-interactive mode prints the
summary without prompting.

* **Documentation**
* Expanded Jetson troubleshooting for JetPack 7 / L4T 39.x and
conditional br_netfilter guidance.
* Added corporate-firewall DNS troubleshooting with platform-specific
remediation and a verification nslookup step.
  * Quickstart updated to document the new confirmation flow.

* **Chores**
  * Documentation version index updated.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wscurran wscurran added the bug-fix PR fixes a bug or regression label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PR fixes a bug or regression

Projects

None yet

Development

Successfully merging this pull request may close these issues.

setup-jetson.sh skips br_netfilter on JetPack R39, breaking k3s service DNS and onboard [7/8]

4 participants