fix(jetson): apply br_netfilter on JetPack R39 when missing (Fixes #2418)#2419
Conversation
) JetPack R39 is silently skipped by setup-jetson.sh (get_jetpack_version early-returns for any release ≥ 39). That means `modprobe br_netfilter` and `sysctl -w net.bridge.bridge-nf-call-iptables=1` — which live inside `configure_jetson_host()` — are never run on R39 hosts. The assumption behind that early-return was wrong: k3s running inside the OpenShell gateway container needs bridge-nf-call-iptables=1 for ClusterIP service routing. Without it, the kube-proxy iptables NAT rules are written correctly but don't match pod-originated traffic, so sandbox agent pods can't resolve `openshell.openshell.svc.cluster.local` and crash-loop during onboard's [7/8] step. Recognize R39 as `jp7-r39`, same as R38 for the iptables / daemon.json case (no changes needed there). For the shared br_netfilter block: keep the existing unconditional behavior for JP6 / JP7-R38 (idempotent, no reason to change what ships today), but on R39 apply only when `/proc/sys/net/bridge/bridge-nf-call-iptables` is absent or not set to 1. This avoids writing NemoClaw-owned drop-ins to `/etc/modules-load.d/` and `/etc/sysctl.d/` on R39 images that already ship with the module in place (not all R39 builds need it — reporter's hit the broken case). Verified manually on an R39 Jetson Orin: before the fix, sandbox agent pod crash-looped 156+ times with `failed to lookup address information: Temporary failure in name resolution`. After `modprobe br_netfilter; sysctl -w net.bridge.bridge-nf-call-iptables=1`, a fresh `nemoclaw onboard` runs through [1/8]–[8/8] cleanly. Full diagnostic chain in #2418. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Yanyun Liao <yanyunl@nvidia.com>
…2418) Address reviewer feedback: keep the existing `if ((release >= 39))` branch in `get_jetpack_version()` and do the R39 check + setup right there instead of threading a new "jp7-r39" token through `configure_jetson_host()`. The earlier version added a `39.*)` case to `get_jetpack_version`, a matching `jp7-r39)` case to `configure_jetson_host`, and a conditional inside that function's br_netfilter block. This version: - Leaves `get_jetpack_version`'s R36/R38 case statement and `configure_jetson_host` case statements completely untouched for R36/R38 — no regression surface there. - Extracts `apply_br_netfilter_setup()` as a shared helper so the R39 branch and the R36/R38 path call the same code without duplication. - In the `>=39` branch: probe `bridge_netfilter_ready` first; if the host is already set up, log and return empty (main() exits early, same as before this PR). Otherwise do the sudo prompt and call `apply_br_netfilter_setup`. Net effect on R39 is identical — `br_netfilter` is loaded and persisted only when missing — but the diff is smaller and R39-specific logic is co-located with the rest of the R39 detection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Yanyun Liao <yanyunl@nvidia.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughCentralizes br_netfilter readiness check and apply logic in the Jetson setup script; conditionally applies iptables-only (R36/R38) or dual-stack (R39) br_netfilter setup (modprobe, runtime sysctls, and persistence files), and avoids early-return on JP7 R39 so host bridge filtering is enforced when needed. Changes
Sequence Diagram(s)sequenceDiagram
participant Installer
participant HostKernel
participant SysctlPersist
participant ModulesPersist
Installer->>HostKernel: bridge_netfilter_ready? (check runtime & persistence)
alt not ready
Installer->>HostKernel: modprobe br_netfilter
Installer->>HostKernel: set sysctl net.bridge.bridge-nf-call-iptables/ip6tables=1
Installer->>SysctlPersist: write /etc/sysctl.d/99-nemoclaw.conf
Installer->>ModulesPersist: write /etc/modules-load.d/nemoclaw.conf
Installer->>HostKernel: log runtime values and persistence paths
else ready
Installer->>Installer: report no host setup required
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
The original fix only set the IPv4 sysctl (bridge-nf-call-iptables), which is enough for IPv4 ClusterIP service routing in k3s — the path validated end-to-end on the reporter's Jetson Orin R39. However, the manual workaround tracked in the issue thread enables both bridge-nf-call-iptables and bridge-nf-call-ip6tables, and that is also the combination that was actually verified on the reporter's host. Shipping only v4 leaves a small but real gap for any future dual-stack or v6-only cluster service the gateway might route, and makes the automated fix diverge slightly from the proven-works manual steps. - apply_br_netfilter_setup: flip ip6tables sysctl alongside iptables, and persist both in /etc/sysctl.d/99-nemoclaw.conf so the setting survives reboot (same drop-in, two lines). - bridge_netfilter_ready: require both sysctls to be 1 (and both /proc nodes to exist) before treating the host as already-configured, so we don't skip the apply step when only v4 is pre-set. - Updated comments to reflect v4+v6 coverage. Docker/k3s do not need to be restarted after this — br_netfilter is a kernel module and the sysctl takes effect immediately; new bridged traffic picks up the new netfilter hook without any daemon reload.
The R39 apply path currently prints a single "loading br_netfilter" line before running modprobe/sysctl/tee, all of which are silent (their stdout is redirected to /dev/null). On success the script exits quietly, leaving the operator without a confirmation that the apply actually completed, which two sysctls were flipped, or where the drop-ins were persisted. On failure, only the raw error from whichever step tripped `set -e` is visible. Add one structured info line at the end of the apply branch summarising what was done and where state now lives. Keeps the skip branch unchanged and does not affect R36/R38 logging (which already gets output from update-alternatives / systemctl restart). Useful for: - confirming the script actually ran the expected code path (in particular, that it's the v4+v6 version, not an older v4-only build) when triaging future reports. - documenting the on-host footprint for later cleanup / debugging.
Refines the confirmation log added in 5b1753f so the script itself produces evidence of the fix's effect, rather than just asserting "we set it to 1". A user validating on their own Jetson can now confirm from script output alone that: 1) The runtime sysctls read back as 1 (read from /proc, not hardcoded in the log string). 2) What this unblocks — sandbox-pod → ClusterIP routing, i.e. the original #2418 symptom. 3) No docker/k3s restart is required (common operator question; answered inline so reporters don't have to ask). 4) Which drop-in files own the reboot persistence. Before: "br_netfilter applied: bridge-nf-call-{iptables,ip6tables}=1; persisted ..." (one line, hardcoded string — no actual verification) After: "br_netfilter runtime: bridge-nf-call-iptables=1, bridge-nf-call-ip6tables=1 — sandbox → ClusterIP routing (CoreDNS, services) is unblocked; no docker or k3s restart needed" "Reboot persistence: /etc/modules-load.d/nemoclaw.conf, /etc/sysctl.d/99-nemoclaw.conf" (two lines; first reads live /proc state as proof) Falls back to "?" in the unlikely case /proc/sys/net/bridge/* is missing when read (e.g., racy unload), so the log line is always printable even in edge cases.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/setup-jetson.sh`:
- Line 183: The call to apply_br_netfilter_setup now routes jp6/jp7-r38 through
a shared helper that introduced expanded persistence writes intended for R39;
isolate legacy JetPack behavior by either creating a separate helper (e.g.,
apply_br_netfilter_setup_legacy) and keep jp6/jp7-r38 calling the legacy
variant, or add a guarded conditional inside apply_br_netfilter_setup that
checks the JetPack version and only performs the extra persistence writes when
the version equals R39; update references to apply_br_netfilter_setup in the jp6
and jp7-r38 flow to use the legacy helper or ensure the conditional prevents
R36/R38 writes so those paths remain byte-identical.
- Around line 19-28: The current bridge_netfilter_ready() only checks live /proc
state; update the logic so the script does not skip persistence steps
prematurely: either extend bridge_netfilter_ready() to also verify the
persistence drop-ins exist and contain the expected settings (check
/etc/modules-load.d/* for br_netfilter and /etc/sysctl.d/* for
bridge-nf-call-iptables and bridge-nf-call-ip6tables), or make the persistence
step idempotent by always creating/updating the /etc/modules-load.d and
/etc/sysctl.d drop-in files when they are missing or don’t contain the required
entries; reference the bridge_netfilter_ready() function and the persistence
targets (/etc/modules-load.d and /etc/sysctl.d) when implementing the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 9260e3aa-267d-460a-a3fb-b1953242f97e
📒 Files selected for processing (1)
scripts/setup-jetson.sh
…2418) Addresses two review comments on PR #2419: 1. bridge_netfilter_ready only inspected the live kernel state, which let the skip branch trigger on runtime-only configuration (e.g. someone ran modprobe + sysctl manually for a test). The drop-ins would never get written and the fix would vanish at the next reboot. The predicate now also requires: - /etc/modules-load.d/nemoclaw.conf contains `br_netfilter` - /etc/sysctl.d/99-nemoclaw.conf contains both sysctl=1 lines apply_br_netfilter_setup_r39 is already idempotent, so when runtime is live but drop-ins are missing or stale (v4-only leftover from an older script), ready returns false and the apply path rewrites them. 2. The earlier "also flip ip6tables" commit (77e84f9) added v6 handling to the shared apply_br_netfilter_setup helper, which was also called by R36/R38 via configure_jetson_host. That expanded the R36/R38 behavior — out of scope for this PR. Split into two helpers: - apply_br_netfilter_setup: v4-only, byte-identical to the pre-#2418 inline block, still used by R36/R38. - apply_br_netfilter_setup_r39: v4+v6, the dual-stack variant validated on the reporter's Jetson; only called from the R39 branch in get_jetpack_version.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/setup-jetson.sh`:
- Around line 30-39: The bridge IPv6 sysctl is being written unconditionally
which fails on kernels without bridge-nf-call-ip6tables; update the script so
the write of net.bridge.bridge-nf-call-ip6tables=1 is only attempted when the
kernel exposes /proc/sys/net/bridge/bridge-nf-call-ip6tables (or when the file
exists), and update the bridge_netfilter_ready() function to check for the IPv6
sysctl file before asserting its value (mirror the existing IPv4 checks in
bridge_netfilter_ready()); ensure any set -e failure paths are avoided by gating
the IPv6 sysctl write with a presence check and only adding the grep check for
net.bridge.bridge-nf-call-ip6tables=1 in /etc/sysctl.d/99-nemoclaw.conf when
that sysctl is present.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: adc4de17-ceae-46df-8123-2bafdfcc99e4
📒 Files selected for processing (1)
scripts/setup-jetson.sh
CodeRabbit flagged that the IPv6 sysctl write and the ready-check for bridge-nf-call-ip6tables were unconditional, which would fail under `set -e` on kernels built without CONFIG_IPV6 (where /proc/sys/net/bridge/bridge-nf-call-ip6tables does not exist). Make the IPv6 leg present-aware in all three places: - apply_br_netfilter_setup_r39: only runs `sysctl -w ...-ip6tables=1` when the /proc node exists; the persistence drop-in then contains only the v4 line on IPv6-less kernels (so bridge_netfilter_ready stays consistent with what was actually written). - bridge_netfilter_ready: v4 checks are mandatory; v6 checks (both the runtime sysctl value and the drop-in line) are gated on the /proc node existing. On a v4-only kernel the predicate returns true with just v4, matching the drop-in content. - R39 branch log line: when /proc for ip6tables is missing, emit "(ip6tables not exposed by this kernel; IPv4-only persistence)" so the operator can tell from logs alone that the v4-only path was taken intentionally, rather than because v6 silently failed.
Removes the IPv6 work that scope-crept into this PR and triggered two rounds of review: the R36/R38 byte-identical split (77e84f9 / 83a2398) and the CONFIG_IPV6-gated presence check (2a3ddbd). The reporter's actual symptom on #2418 — sandbox pod → ClusterIP DNS timing out — is IPv4-only (CoreDNS ClusterIP 10.43.0.10). The v4 sysctl + br_netfilter modprobe alone is sufficient and matches what the R36/R38 paths have always done. IPv6 was belt-and- suspenders; every review iteration showed the extra guard code was more complexity than the robustness was worth. Net effect: - apply_br_netfilter_setup_r39 is deleted; R39 calls the shared apply_br_netfilter_setup (same helper R36/R38 already use). - bridge_netfilter_ready drops IPv6 runtime and persistence checks. - R39 log line drops the IPv6 summary branching. - CodeRabbit's other finding (persistence must be part of ready) is preserved: bridge_netfilter_ready still checks /etc/modules-load.d/nemoclaw.conf and /etc/sysctl.d/99-nemoclaw.conf content, so transient modprobe-only state no longer triggers skip. If a future issue shows IPv6 cluster-service routing breaking on R39, it can be added back with its own scope and validation.
ericksoa
left a comment
There was a problem hiding this comment.
Clean, well-scoped fix. The root cause (br_netfilter missing on R39, blocking kube-proxy NAT for pod → ClusterIP traffic) is well-diagnosed, and the fix is minimal:
bridge_netfilter_ready()checks both runtime state and persistence drop-ins — correctly avoids skipping when someone ranmodprobetransiently without persisting.apply_br_netfilter_setup()is a pure extraction of the existing 4-command sequence — R36/R38 behavior is byte-identical.- R39 branch applies conditionally and reads back the /proc value as log evidence.
- All CodeRabbit feedback addressed (persistence check, legacy path isolation, IPv6 guard removed).
LGTM.
## Summary Catches up the user-facing docs for changes merged since v0.0.24 and bumps `docs/versions1.json` so the next docs build publishes as 0.0.25. ## Changes - `docs/reference/troubleshooting.md`: corrected the JetPack 7 (L4T 39.x) entry to reflect the new conditional `br_netfilter` auto-load on R39 hosts that lack it (from #2419), and added a new "DNS resolution from inside docker fails (corporate firewall)" section covering the new container-DNS preflight probe and platform-specific remediation paths (from #2349). - `docs/get-started/quickstart.md`: added a "Review the Configuration Before the Sandbox Build" subsection documenting the new `[Y/n]` review gate that fires after the sandbox-name prompt (from #2221). - `docs/versions1.json`: promoted 0.0.25 to `preferred: true`, demoted 0.0.24, and trimmed to ten entries to match what `scripts/bump-version.ts` writes. `docs/project.json` regenerates from `versions1.json` at build time and now reports `0.0.25`. - `.agents/skills/nemoclaw-user-get-started/` and `.agents/skills/nemoclaw-user-reference/`: regenerated via `scripts/docs-to-skills.py`. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [x] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes - [x] `npm test` passes - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [x] `make docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) ## AI Disclosure - [x] AI-assisted — tool: Claude Code --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Onboarding now shows a configuration review and requires explicit confirmation before building a sandbox; non-interactive mode prints the summary without prompting. * **Documentation** * Expanded Jetson troubleshooting for JetPack 7 / L4T 39.x and conditional br_netfilter guidance. * Added corporate-firewall DNS troubleshooting with platform-specific remediation and a verification nslookup step. * Quickstart updated to document the new confirmation flow. * **Chores** * Documentation version index updated. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…IDIA#2418) (NVIDIA#2419) ## Summary Fixes NVIDIA#2418. JetPack R39 was silently skipped by `setup-jetson.sh` — `get_jetpack_version()` returned empty for any release ≥ 39, so `main()` exited before ever setting up `br_netfilter`. Without `net.bridge.bridge-nf-call-iptables=1`, kube-proxy's iptables NAT rules are present but don't match pod-originated traffic, so sandbox agent pods can't resolve `openshell.openshell.svc.cluster.local` and crash-loop during onboard's `[7/8] Setting up OpenClaw inside sandbox` step. See NVIDIA#2418 for full diagnostic chain: pod-IP DNS query works, service-IP times out, iptables `KUBE-SVC-*` rules exist, `lsmod | grep br_netfilter` is empty, and loading the module immediately fixes it. ## Changes Scoped to `scripts/setup-jetson.sh` only. Approach keeps R36/R38 code paths untouched and **co-locates R39 handling inside the existing `if ((release >= 39))` branch** rather than threading a new `jp7-r39` token through `configure_jetson_host()`. 1. **New helper `bridge_netfilter_ready()`** — returns 0 when `/proc/sys/net/bridge/bridge-nf-call-iptables` exists and reads `1`. That file only appears after the module is loaded, so a single check covers both conditions. 2. **New helper `apply_br_netfilter_setup()`** — extracts the four-line `modprobe` + `sysctl` + `tee /etc/modules-load.d/` + `tee /etc/sysctl.d/` block that previously lived inline in `configure_jetson_host()`. Both the R39 branch and the R36/R38 path now call this helper, so behavior for R36/R38 is byte-identical to main. 3. **R39 branch in `get_jetpack_version()`** — the existing `if ((release >= 39))` early-return now carries the work instead of just logging and returning: - If `bridge_netfilter_ready` → log and return empty (main exits, no host state is modified). - Otherwise → check sudo, call `apply_br_netfilter_setup`, return empty. Returning empty preserves the existing invariant that R39 does not flow through `configure_jetson_host()` (which handles iptables-legacy / `daemon.json` / post-setup docker restart — none of which R39 needs). Conditional on R39 only because not every R39 image hits the bug: some ship with `br_netfilter` already loaded and we don't want to plant `/etc/modules-load.d/nemoclaw.conf` + `/etc/sysctl.d/99-nemoclaw.conf` drop-ins on those systems. For JP6 / JP7-R38 the behavior is preserved exactly (unconditional `apply_br_netfilter_setup` via `configure_jetson_host`). ### What the user sees on affected R39 hosts Before (bug): ``` Jetson detected (L4T 39.1) — this version does not require any host setup ``` (…then onboard[7/8] fails with `sandbox is not ready`.) After, first-affected R39 install: ``` Jetson detected (L4T 39.1) — loading br_netfilter (required by k3s inside the OpenShell gateway; see NVIDIA#2418) ``` After, R39 image that already has it: ``` Jetson detected (L4T 39.1) — br_netfilter already configured; no host setup needed ``` ## Verification - `bash -n scripts/setup-jetson.sh` — clean (syntax check). - `npm run typecheck:cli` — clean. - `npm run ts-migration:guard -- --base origin/main --head HEAD` — `No edits to migrated legacy paths or removed compatibility shims detected.` - Existing `test/setup-jetson.test.ts` (4 tests for the Python `daemon.json` patcher, which is unchanged) — all pass. - **Manual end-to-end on a JetPack R39 Jetson Orin** (the NVIDIA#2418 reporter's affected machine): - **Before**: `nemoclaw onboard` reaches `[7/8]` and fails with `sandbox is not ready`; `kubectl describe pod my-assistant -n openshell` shows the `agent` container in `CrashLoopBackOff` with 156+ restarts, logs show `Temporary failure in name resolution` for `openshell.openshell.svc.cluster.local`. - **After applying the equivalent of this fix manually** (`sudo modprobe br_netfilter && sudo sysctl -w net.bridge.bridge-nf-call-iptables=1 net.bridge.bridge-nf-call-ip6tables=1`) and rerunning `nemoclaw onboard` → `[1/8]`–`[8/8]` complete, sandbox reaches Ready, dashboard reachable. No bash-level unit tests added because the existing test coverage for this script only exercises the embedded Python `daemon.json` patcher (extracted via regex), and the new logic is top-level bash that would require either a source-guard refactor or mocking `/etc/nv_tegra_release` / `/proc/sys/...` via env-var indirection — neither matches existing patterns. Happy to add them if a reviewer prefers. ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## AI Disclosure - [x] AI-assisted — tool: Claude Code <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Centralized bridge/netfilter setup with a readiness check so host changes are applied only when needed. * Newer releases get a conditional dual‑stack (IPv4+IPv6) setup; older paths use iptables‑only configuration. * Ensures elevated privileges when required, persists bridge/IPv4/IPv6 settings across reboots, and logs runtime values and persistence locations. <!-- end of auto-generated comment: release notes by coderabbit.ai --> <!-- DCO sign-off required by CI. Run: git config user.name && git config user.email --> Signed-off-by: Yanyun Liao <yanyunl@nvidia.com> --------- Signed-off-by: Yanyun Liao <yanyunl@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Summary Catches up the user-facing docs for changes merged since v0.0.24 and bumps `docs/versions1.json` so the next docs build publishes as 0.0.25. ## Changes - `docs/reference/troubleshooting.md`: corrected the JetPack 7 (L4T 39.x) entry to reflect the new conditional `br_netfilter` auto-load on R39 hosts that lack it (from NVIDIA#2419), and added a new "DNS resolution from inside docker fails (corporate firewall)" section covering the new container-DNS preflight probe and platform-specific remediation paths (from NVIDIA#2349). - `docs/get-started/quickstart.md`: added a "Review the Configuration Before the Sandbox Build" subsection documenting the new `[Y/n]` review gate that fires after the sandbox-name prompt (from NVIDIA#2221). - `docs/versions1.json`: promoted 0.0.25 to `preferred: true`, demoted 0.0.24, and trimmed to ten entries to match what `scripts/bump-version.ts` writes. `docs/project.json` regenerates from `versions1.json` at build time and now reports `0.0.25`. - `.agents/skills/nemoclaw-user-get-started/` and `.agents/skills/nemoclaw-user-reference/`: regenerated via `scripts/docs-to-skills.py`. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [x] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes - [x] `npm test` passes - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [x] `make docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) ## AI Disclosure - [x] AI-assisted — tool: Claude Code --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Onboarding now shows a configuration review and requires explicit confirmation before building a sandbox; non-interactive mode prints the summary without prompting. * **Documentation** * Expanded Jetson troubleshooting for JetPack 7 / L4T 39.x and conditional br_netfilter guidance. * Added corporate-firewall DNS troubleshooting with platform-specific remediation and a verification nslookup step. * Quickstart updated to document the new confirmation flow. * **Chores** * Documentation version index updated. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bring feat/jetson-orin-nano-support up to date with origin/main after 217 commits of upstream work. Conflict resolutions: - scripts/setup-jetson.sh: adopt main's L4T-version-aware structure (NVIDIA#1910, NVIDIA#1913, NVIDIA#2419) — JP6 / JP7-R38 / JP7-R39 case dispatch with idempotent br_netfilter persistence. Drops the PR's preflight wrapper in favor of the simpler main rewrite that already covers Orin Nano via the JP6 path. - src/lib/onboard.ts: keep main's verifyWebSearchInsideSandbox; preserve the Jetson GPU-detection branch on top of main's refined NVIDIA detection format; restore the gpu parameter on startGatewayWithOptions so the patchGatewayImageForJetson() call still type-checks. - src/lib/onboard-providers.ts: re-add the ollama-local / vllm-local branches in getSandboxInferenceConfig that override inferenceBaseUrl via getLocalProviderBaseUrl. - docs/get-started/quickstart.md and the autogenerated skills commands.md: take main (Jetson now lives in ci/platform-matrix.json). Port setup-jetson into main's oclif command-registry architecture: - Add SetupJetsonCliCommand thin oclif adapter and runSetupJetsonAction that shells out to scripts/setup-jetson.sh under sudo on Linux. - Register setup-jetson in command-registry.ts (Getting Started), legacy-oclif-dispatch.ts, and oclif-commands.ts. Switch patchGatewayImageForJetson to docker.dockerInspectFormat / docker.dockerRun so it stays inside main's docker-abstraction guard, and update the existing jetson tests to match the argv-based runner contract. Fix the Jetson note in ci/platform-matrix.json — the relative ../reference/commands.md#nemoclaw-setup-jetson link broke the markdown-links CI check when the platform-matrix generator copied it into root README.md. Replaced with the docs.nvidia.com absolute URL. Allowlist two pre-existing slack-app-token false positives surfaced by the local pre-commit gitleaks pass. Verified: npm run build:cli, npm run lint, full vitest suite (3160 pass, 13 skip), and check-docs.sh markdown-link check all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Summary Catches up the user-facing docs for changes merged since v0.0.24 and bumps `docs/versions1.json` so the next docs build publishes as 0.0.25. ## Changes - `docs/reference/troubleshooting.md`: corrected the JetPack 7 (L4T 39.x) entry to reflect the new conditional `br_netfilter` auto-load on R39 hosts that lack it (from NVIDIA#2419), and added a new "DNS resolution from inside docker fails (corporate firewall)" section covering the new container-DNS preflight probe and platform-specific remediation paths (from NVIDIA#2349). - `docs/get-started/quickstart.md`: added a "Review the Configuration Before the Sandbox Build" subsection documenting the new `[Y/n]` review gate that fires after the sandbox-name prompt (from NVIDIA#2221). - `docs/versions1.json`: promoted 0.0.25 to `preferred: true`, demoted 0.0.24, and trimmed to ten entries to match what `scripts/bump-version.ts` writes. `docs/project.json` regenerates from `versions1.json` at build time and now reports `0.0.25`. - `.agents/skills/nemoclaw-user-get-started/` and `.agents/skills/nemoclaw-user-reference/`: regenerated via `scripts/docs-to-skills.py`. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [x] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes - [x] `npm test` passes - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [x] `make docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) ## AI Disclosure - [x] AI-assisted — tool: Claude Code --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Onboarding now shows a configuration review and requires explicit confirmation before building a sandbox; non-interactive mode prints the summary without prompting. * **Documentation** * Expanded Jetson troubleshooting for JetPack 7 / L4T 39.x and conditional br_netfilter guidance. * Added corporate-firewall DNS troubleshooting with platform-specific remediation and a verification nslookup step. * Quickstart updated to document the new confirmation flow. * **Chores** * Documentation version index updated. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Fixes #2418.
JetPack R39 was silently skipped by
setup-jetson.sh—get_jetpack_version()returned empty for any release ≥ 39, somain()exited before ever setting upbr_netfilter. Withoutnet.bridge.bridge-nf-call-iptables=1, kube-proxy's iptables NAT rules are present but don't match pod-originated traffic, so sandboxagent pods can't resolve
openshell.openshell.svc.cluster.localand crash-loop during onboard's[7/8] Setting up OpenClaw inside sandboxstep.See #2418 for full diagnostic chain: pod-IP DNS query works, service-IP times out, iptables
KUBE-SVC-*rules exist,lsmod | grep br_netfilteris empty, and loading the module immediately fixes it.Changes
Scoped to
scripts/setup-jetson.shonly. Approach keeps R36/R38 code paths untouched and co-locates R39 handling inside the existingif ((release >= 39))branch rather than threading a newjp7-r39token throughconfigure_jetson_host().New helper
bridge_netfilter_ready()— returns 0 when/proc/sys/net/bridge/bridge-nf-call-iptablesexists and reads1. That file only appears after the module is loaded, so a single check covers both conditions.New helper
apply_br_netfilter_setup()— extracts the four-linemodprobe+sysctl+tee /etc/modules-load.d/+tee /etc/sysctl.d/block that previously lived inline inconfigure_jetson_host(). Both the R39 branch and the R36/R38 path now call this helper, so behavior for R36/R38 is byte-identical tomain.
R39 branch in
get_jetpack_version()— the existingif ((release >= 39))early-return now carries the work instead of just logging and returning:bridge_netfilter_ready→ log and return empty (main exits, no host state is modified).apply_br_netfilter_setup, return empty.Returning empty preserves the existing invariant that R39 does not flow through
configure_jetson_host()(which handles iptables-legacy /daemon.json/ post-setup docker restart — none of which R39 needs).Conditional on R39 only because not every R39 image hits the bug: some ship with
br_netfilteralready loaded and we don't want to plant/etc/modules-load.d/nemoclaw.conf+/etc/sysctl.d/99-nemoclaw.confdrop-ins on those systems. For JP6 / JP7-R38 the behavior is preserved exactly (unconditionalapply_br_netfilter_setupviaconfigure_jetson_host).What the user sees on affected R39 hosts
Before (bug):
(…then onboard[7/8] fails with
sandbox is not ready.)After, first-affected R39 install:
After, R39 image that already has it:
Verification
bash -n scripts/setup-jetson.sh— clean (syntax check).npm run typecheck:cli— clean.npm run ts-migration:guard -- --base origin/main --head HEAD—No edits to migrated legacy paths or removed compatibility shims detected.test/setup-jetson.test.ts(4 tests for the Pythondaemon.jsonpatcher, which is unchanged) — all pass.nemoclaw onboardreaches[7/8]and fails withsandbox is not ready;kubectl describe pod my-assistant -n openshellshows theagentcontainer inCrashLoopBackOffwith 156+ restarts, logs showTemporary failure in name resolutionforopenshell.openshell.svc.cluster.local.sudo modprobe br_netfilter && sudo sysctl -w net.bridge.bridge-nf-call-iptables=1 net.bridge.bridge-nf-call-ip6tables=1) and rerunningnemoclaw onboard→[1/8]–[8/8]complete, sandbox reaches Ready, dashboard reachable.No bash-level unit tests added because the existing test coverage for this script only exercises the embedded Python
daemon.jsonpatcher (extracted via regex), and the new logic is top-level bash that would require either a source-guard refactor or mocking/etc/nv_tegra_release//proc/sys/...via env-varindirection — neither matches existing patterns. Happy to add them if a reviewer prefers.
Type of Change
AI Disclosure
Summary by CodeRabbit
Signed-off-by: Yanyun Liao yanyunl@nvidia.com