fix(security): surface immutable symlink hardening status by 13ernkastel · Pull Request #1 · latenighthackathon/NemoClaw

13ernkastel · 2026-03-31T05:21:57Z

Summary

This PR builds on NVIDIA#1137 and tightens the immutable symlink hardening path without changing its core approach.

What Changed

factors symlink validation into a reusable helper so both startup paths use the same validation logic
adds explicit security logging when immutable hardening succeeds, is partial, or is skipped
extends the gateway isolation E2E to fail if chattr is missing from the image, so the mitigation cannot silently disappear

Why

The original fix is directionally strong, but today the chattr path is intentionally best-effort and silent. That makes review and operations harder because:

a missing chattr binary looks the same as successful hardening
partial chattr +i failures are suppressed with no visibility
the image can regress and stop shipping chattr without CI catching it

These changes make the mitigation easier to trust and easier to debug while staying compatible with the current defense-in-depth model.

Validation

bash -n scripts/nemoclaw-start.sh
bash -n test/e2e-gateway-isolation.sh

Relationship To NVIDIA#1137

This is a follow-up hardening PR intended to sit on top of NVIDIA#1137 rather than replace it.

The symlink validation loop in nemoclaw-start.sh verifies that all symlinks in /sandbox/.openclaw/ point to their expected /sandbox/.openclaw-data/ targets, but this check runs only once at boot. After validation, the symlinks could theoretically be swapped before the gateway starts on the next line (TOCTOU). While DAC already prevents the sandbox user from modifying the root-owned /sandbox/.openclaw directory, this adds defense-in-depth by setting the immutable flag (chattr +i) on both the directory and its symlinks after validation passes. The immutable flag cannot be removed by the sandbox user, closing the TOCTOU window even if DAC or Landlock are bypassed. The fix degrades gracefully: if chattr is not available or the filesystem does not support immutable flags, the existing DAC protections remain in effect. Closes NVIDIA#1019 Signed-off-by: latenighthackathon <latenighthackathon@users.noreply.github.com>

Refactor symlink validation into reusable helpers, log when immutable hardening is unavailable or partial, and fail the gateway-isolation E2E when chattr is missing from the image.

cv · 2026-03-31T19:02:44Z

@13ernkastel nice follow-up! Mind posting it to NVIDIA/NemoClaw? Happy to merge it there.

13ernkastel · 2026-04-04T03:00:43Z

Reposted upstream as NVIDIA#1467:
NVIDIA#1467

That PR carries the same follow-up hardening-observability changes against current main and notes the targeted validation I could run in this environment.

cv · 2026-04-04T17:22:07Z

Looks like NVIDIA#1467 was closed shortly after, though?

13ernkastel · 2026-04-05T03:49:47Z

@cv upstream is live now at NVIDIA#1499:
NVIDIA#1499

#1467 was auto-closed by the contributor open-PR limit and would not reopen cleanly, so I recreated it from the same branch once a slot was free.

## Summary This follow-up builds on #1137 and improves the observability around immutable symlink hardening without changing the underlying defense-in-depth approach. ## What Changed - factors `.openclaw` symlink validation into a reusable helper so both startup paths use the same validation logic - adds explicit security logging when immutable hardening succeeds, is partial, or is skipped because `chattr` is unavailable - extends the gateway-isolation E2E to fail if `chattr` is missing from the image, so the mitigation cannot silently disappear ## Why The original immutable-hardening fix is directionally strong, but the `chattr` path is intentionally best-effort and currently silent. That makes the mitigation harder to trust and harder to debug because: - a missing `chattr` binary looks the same as successful hardening - partial `chattr +i` failures are suppressed with no visibility - the image can regress and stop shipping `chattr` without CI catching it These changes make the mitigation easier to audit while staying compatible with the current layered hardening model. ## Validation - `bash -n scripts/nemoclaw-start.sh` - `bash -n test/e2e-gateway-isolation.sh` - `git diff --check` - not run: `test/e2e-gateway-isolation.sh` (`docker` is not installed in this environment) ## Relationship To #1137 This is a repost of the follow-up originally opened as `latenighthackathon#1`, now targeted at `NVIDIA/NemoClaw` as requested. ## Note This replaces `#1467`, which GitHub auto-closed because the repository's contributor open-PR limit was hit at the time. Signed-off-by: 13ernkastel <LennonCMJ@live.com>  ## Summary by CodeRabbit * **Chores** * Enhanced startup process validation to ensure system integrity and correct configuration * Improved security hardening mechanisms with comprehensive logging and graceful fallback handling when system features are unavailable * **Tests** * Updated end-to-end integration tests to verify system hardening capabilities and feature availability  Co-authored-by: Carlos Villela <cvillela@nvidia.com>

## Summary This follow-up builds on NVIDIA#1137 and improves the observability around immutable symlink hardening without changing the underlying defense-in-depth approach. ## What Changed - factors `.openclaw` symlink validation into a reusable helper so both startup paths use the same validation logic - adds explicit security logging when immutable hardening succeeds, is partial, or is skipped because `chattr` is unavailable - extends the gateway-isolation E2E to fail if `chattr` is missing from the image, so the mitigation cannot silently disappear ## Why The original immutable-hardening fix is directionally strong, but the `chattr` path is intentionally best-effort and currently silent. That makes the mitigation harder to trust and harder to debug because: - a missing `chattr` binary looks the same as successful hardening - partial `chattr +i` failures are suppressed with no visibility - the image can regress and stop shipping `chattr` without CI catching it These changes make the mitigation easier to audit while staying compatible with the current layered hardening model. ## Validation - `bash -n scripts/nemoclaw-start.sh` - `bash -n test/e2e-gateway-isolation.sh` - `git diff --check` - not run: `test/e2e-gateway-isolation.sh` (`docker` is not installed in this environment) ## Relationship To NVIDIA#1137 This is a repost of the follow-up originally opened as `latenighthackathon#1`, now targeted at `NVIDIA/NemoClaw` as requested. ## Note This replaces `NVIDIA#1467`, which GitHub auto-closed because the repository's contributor open-PR limit was hit at the time. Signed-off-by: 13ernkastel <LennonCMJ@live.com>  ## Summary by CodeRabbit * **Chores** * Enhanced startup process validation to ensure system integrity and correct configuration * Improved security hardening mechanisms with comprehensive logging and graceful fallback handling when system features are unavailable * **Tests** * Updated end-to-end integration tests to verify system hardening capabilities and feature availability  Co-authored-by: Carlos Villela <cvillela@nvidia.com>

…IA#1114) (NVIDIA#1305) ## Summary Fixes the four issues reported in NVIDIA#1114 — EACCES permission errors and missing gateway token when running inside the NemoClaw sandbox. ### Issue mapping | # | Reported error | Fix | |---|----------------|-----| | 1 | `EACCES: open '/sandbox/.openclaw/openclaw.json.*.tmp'` | `install_configure_guard` — intercepts `openclaw configure` with a clear error and directs users to `nemoclaw onboard --resume` on the host | | 2 | Same as #1 (different PID) | Same fix | | 3 | `EACCES: mkdir '/sandbox/.openclaw/credentials'` | Already resolved on main via NVIDIA#1519 (credentials symlink to `.openclaw-data/`) | | 4 | No WhatsApp QR code | Consequence of #3, also resolved by NVIDIA#1519 | ### Root cause (issues 1 & 2) OpenClaw's `configure` command performs atomic writes — it creates a temp file (`openclaw.json.PID.UUID.tmp`) in the same directory as the config. Since `/sandbox/.openclaw/` is Landlock read-only at the kernel level, file creation is rejected with EACCES. This is by design: the sandbox config is intentionally immutable at runtime. Rather than weakening Landlock (security regression), we intercept the command in the sandbox shell and guide users to the correct host-side workflow. ### Changes **1. `install_configure_guard()`** — Writes a shell function wrapper to `.bashrc`/`.profile` that intercepts `openclaw configure` and prints: ``` Error: 'openclaw configure' cannot modify config inside the sandbox. The sandbox config is read-only (Landlock enforced) for security. To change your configuration, exit the sandbox and run: nemoclaw onboard --resume This rebuilds the sandbox with your updated settings. ``` All other `openclaw` subcommands pass through to the real binary. **2. `export_gateway_token()`** — Reads `gateway.auth.token` from `openclaw.json` and exports it as `OPENCLAW_GATEWAY_TOKEN`, so interactive sessions (`openshell sandbox connect`) can authenticate with the gateway. Persists to `.bashrc`/`.profile` using idempotent marker blocks and cleans stale tokens on revocation. **3. `_read_gateway_token()` helper** — Shared Python snippet used by both `export_gateway_token` and `print_dashboard_urls` (deduplication, uses `with open()` context manager). All three are called in both root and non-root startup paths. ## Security properties preserved - `/sandbox/.openclaw` remains root-owned, Landlock read-only - `openclaw.json` remains chmod 444 (immutable) - No new attack surface — token is read-only from existing config - `command openclaw` bypass preserves all non-configure functionality Fixes NVIDIA#1114 Signed-off-by: Dongni Yang <dongniy@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Signed-off-by: Dongni Yang <dongniy@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

## Summary This follow-up builds on NVIDIA#1137 and improves the observability around immutable symlink hardening without changing the underlying defense-in-depth approach. ## What Changed - factors `.openclaw` symlink validation into a reusable helper so both startup paths use the same validation logic - adds explicit security logging when immutable hardening succeeds, is partial, or is skipped because `chattr` is unavailable - extends the gateway-isolation E2E to fail if `chattr` is missing from the image, so the mitigation cannot silently disappear ## Why The original immutable-hardening fix is directionally strong, but the `chattr` path is intentionally best-effort and currently silent. That makes the mitigation harder to trust and harder to debug because: - a missing `chattr` binary looks the same as successful hardening - partial `chattr +i` failures are suppressed with no visibility - the image can regress and stop shipping `chattr` without CI catching it These changes make the mitigation easier to audit while staying compatible with the current layered hardening model. ## Validation - `bash -n scripts/nemoclaw-start.sh` - `bash -n test/e2e-gateway-isolation.sh` - `git diff --check` - not run: `test/e2e-gateway-isolation.sh` (`docker` is not installed in this environment) ## Relationship To NVIDIA#1137 This is a repost of the follow-up originally opened as `latenighthackathon#1`, now targeted at `NVIDIA/NemoClaw` as requested. ## Note This replaces `NVIDIA#1467`, which GitHub auto-closed because the repository's contributor open-PR limit was hit at the time. Signed-off-by: 13ernkastel <LennonCMJ@live.com>  ## Summary by CodeRabbit * **Chores** * Enhanced startup process validation to ensure system integrity and correct configuration * Improved security hardening mechanisms with comprehensive logging and graceful fallback handling when system features are unavailable * **Tests** * Updated end-to-end integration tests to verify system hardening capabilities and feature availability  Co-authored-by: Carlos Villela <cvillela@nvidia.com>

… (NVIDIA#2404) ## Summary NemoClaw's sandbox create stream only recognized the legacy Docker builder format, so BuildKit output would not be treated as active build progress once OpenShell emits it. This adds BuildKit progress markers to the same parser path as the existing legacy builder output. It keeps the current legacy behavior and makes `#1 [internal] ...`, `#2 CACHED`, and `#3 DONE ...` visible as build progress. ## Changes - `src/lib/sandbox-create-stream.ts`: recognize BuildKit step and completion lines while tracking the build phase. - `src/lib/sandbox-create-stream.test.ts`: cover BuildKit progress output and verify it is streamed to the user. ## Testing - `npm run build:cli` passed - `npm run typecheck:cli` passed - `npm test -- src/lib/sandbox-create-stream.test.ts` passed - `npm test` was also attempted. The full suite is not green on current main in this environment; failures are in existing installer/onboard/legacy-guard tests outside this change. ## Evidence it works The new focused test feeds BuildKit-style output into `streamSandboxCreate` and verifies that the lines are logged, collected in output, and mark sandbox creation as having seen progress. Fixes NVIDIA#2311 Signed-off-by: Deepak Jain <deepujain@gmail.com>  ## Summary by CodeRabbit * **Bug Fixes** * Improved detection and display of BuildKit and upload progress so progress markers and completion states are recognized reliably. * **Refactor** * Centralized progress-detection logic for more consistent handling of build and upload output. * **Tests** * Added a test ensuring BuildKit-formatted progress lines are captured, included in output, and reported to the log callback.  Signed-off-by: Deepak Jain <deepujain@gmail.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com>

…3456) (NVIDIA#3520) > **Draft for visibility.** Issue-autopilot Stages 4-5 of NVIDIA#3456. Will mark ready once batch self-review + CI complete. ## Summary Closes the two remaining output threads in NVIDIA#3456 after the core dead-loop fix already landed on `main` (via NVIDIA#3459, NVIDIA#3434, NVIDIA#3483). Full sub-bug mapping in the [NVIDIA#3456 status comment](NVIDIA#3456 (comment)). - **Sub-bug #3** — `nemoclaw <name> destroy --yes` recovery hint replaced with a registry-aware helper. - **Sub-bug NVIDIA#4** — `Destroyed gateway 'nemoclaw' skipped` self-contradictory wording replaced with `Gateway 'nemoclaw' already removed or unreachable`. ## Acceptance criteria mapping | Sub-bug | Resolution | Evidence | |---|---|---| | #1 dead loop | Already fixed on main (NVIDIA#3459) | out of scope | | #2 firewall diagnostic | Already fixed on main (NVIDIA#3459) | out of scope | | **#3** literal `<name>` placeholder | **This PR** | `src/lib/onboard/gpu-recovery.ts` + `onboard.ts:10387-10405` | | **NVIDIA#4** misleading "skipped" wording | **This PR** | `src/lib/actions/uninstall/run-plan.ts:210-228, 407-414` | | NVIDIA#5 uninstall residuals | Already fixed on main (NVIDIA#3483) | out of scope | ## Behavior matrix `gpuPassthroughRecoveryLines(names)`: | Input | Suggestion | |---|---| | `null` / `[]` | `nemoclaw uninstall && nemoclaw onboard --gpu` | | one sandbox | `nemoclaw <name> destroy --yes --cleanup-gateway && nemoclaw onboard --gpu` | | many sandboxes | each `destroy --yes`, only the last gets `--cleanup-gateway` | ## Test plan ``` npm run typecheck:cli npx vitest run src/lib/onboard/gpu-recovery.test.ts src/lib/actions/uninstall/run-plan.test.ts ``` 22 tests pass (6 new + 16 existing). ## Notes for reviewers - This is the work [NVIDIA#3464 attempted](NVIDIA#3464); that PR was closed without merging after CodeRabbit asked for the `<name>` placeholder to be forbidden in tests via negative assertion. This PR adopts that refinement. - `runOptional` extension is backwards-compatible — existing callers without `onSkip` get the original wording. Closes NVIDIA#3456 once merged. --------- Signed-off-by: Charan Jagwani <charjags100@gmail.com> Co-authored-by: Charan Jagwani <charjags100@gmail.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com>

latenighthackathon and others added 3 commits March 30, 2026 22:07

Merge branch 'main' into fix/symlink-immutable-after-validation

99ef53b

fix(security): surface immutable symlink hardening status

3b80e5d

Refactor symlink validation into reusable helpers, log when immutable hardening is unavailable or partial, and fail the gateway-isolation E2E when chattr is missing from the image.

13ernkastel mentioned this pull request Mar 31, 2026

fix(security): make .openclaw symlinks immutable after validation NVIDIA/NemoClaw#1137

Merged

3 tasks

latenighthackathon force-pushed the fix/symlink-immutable-after-validation branch from 99ef53b to 5ab4386 Compare March 31, 2026 13:14

13ernkastel closed this Apr 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(security): surface immutable symlink hardening status#1

fix(security): surface immutable symlink hardening status#1
13ernkastel wants to merge 3 commits into
latenighthackathon:fix/symlink-immutable-after-validationfrom
13ernkastel:codex/pr-1137-hardening-observability

13ernkastel commented Mar 31, 2026

Uh oh!

cv commented Mar 31, 2026

Uh oh!

13ernkastel commented Apr 4, 2026

Uh oh!

cv commented Apr 4, 2026

Uh oh!

13ernkastel commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

13ernkastel commented Mar 31, 2026

Summary

What Changed

Why

Validation

Relationship To NVIDIA#1137

Uh oh!

cv commented Mar 31, 2026

Uh oh!

13ernkastel commented Apr 4, 2026

Uh oh!

cv commented Apr 4, 2026

Uh oh!

13ernkastel commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants