Skip to content

fix(security): surface immutable symlink hardening status#1

Closed
13ernkastel wants to merge 3 commits into
latenighthackathon:fix/symlink-immutable-after-validationfrom
13ernkastel:codex/pr-1137-hardening-observability
Closed

fix(security): surface immutable symlink hardening status#1
13ernkastel wants to merge 3 commits into
latenighthackathon:fix/symlink-immutable-after-validationfrom
13ernkastel:codex/pr-1137-hardening-observability

Conversation

@13ernkastel

Copy link
Copy Markdown

Summary

This PR builds on NVIDIA#1137 and tightens the immutable symlink hardening path without changing its core approach.

What Changed

  • factors symlink validation into a reusable helper so both startup paths use the same validation logic
  • adds explicit security logging when immutable hardening succeeds, is partial, or is skipped
  • extends the gateway isolation E2E to fail if chattr is missing from the image, so the mitigation cannot silently disappear

Why

The original fix is directionally strong, but today the chattr path is intentionally best-effort and silent. That makes review and operations harder because:

  • a missing chattr binary looks the same as successful hardening
  • partial chattr +i failures are suppressed with no visibility
  • the image can regress and stop shipping chattr without CI catching it

These changes make the mitigation easier to trust and easier to debug while staying compatible with the current defense-in-depth model.

Validation

  • bash -n scripts/nemoclaw-start.sh
  • bash -n test/e2e-gateway-isolation.sh

Relationship To NVIDIA#1137

This is a follow-up hardening PR intended to sit on top of NVIDIA#1137 rather than replace it.

latenighthackathon and others added 3 commits March 30, 2026 22:07
The symlink validation loop in nemoclaw-start.sh verifies that all
symlinks in /sandbox/.openclaw/ point to their expected
/sandbox/.openclaw-data/ targets, but this check runs only once at
boot. After validation, the symlinks could theoretically be swapped
before the gateway starts on the next line (TOCTOU).

While DAC already prevents the sandbox user from modifying the
root-owned /sandbox/.openclaw directory, this adds defense-in-depth
by setting the immutable flag (chattr +i) on both the directory and
its symlinks after validation passes. The immutable flag cannot be
removed by the sandbox user, closing the TOCTOU window even if DAC
or Landlock are bypassed.

The fix degrades gracefully: if chattr is not available or the
filesystem does not support immutable flags, the existing DAC
protections remain in effect.

Closes NVIDIA#1019

Signed-off-by: latenighthackathon <latenighthackathon@users.noreply.github.com>
Refactor symlink validation into reusable helpers, log when immutable hardening is unavailable or partial, and fail the gateway-isolation E2E when chattr is missing from the image.
@cv

cv commented Mar 31, 2026

Copy link
Copy Markdown

@13ernkastel nice follow-up! Mind posting it to NVIDIA/NemoClaw? Happy to merge it there.

@13ernkastel

Copy link
Copy Markdown
Author

Reposted upstream as NVIDIA#1467:
NVIDIA#1467

That PR carries the same follow-up hardening-observability changes against current main and notes the targeted validation I could run in this environment.

@cv

cv commented Apr 4, 2026

Copy link
Copy Markdown

Looks like NVIDIA#1467 was closed shortly after, though?

@13ernkastel 13ernkastel closed this Apr 5, 2026
@13ernkastel

Copy link
Copy Markdown
Author

@cv upstream is live now at NVIDIA#1499:
NVIDIA#1499

#1467 was auto-closed by the contributor open-PR limit and would not reopen cleanly, so I recreated it from the same branch once a slot was free.

cv added a commit to NVIDIA/NemoClaw that referenced this pull request Apr 6, 2026
## Summary

This follow-up builds on #1137 and improves the observability around
immutable symlink hardening without changing the underlying
defense-in-depth approach.

## What Changed

- factors `.openclaw` symlink validation into a reusable helper so both
startup paths use the same validation logic
- adds explicit security logging when immutable hardening succeeds, is
partial, or is skipped because `chattr` is unavailable
- extends the gateway-isolation E2E to fail if `chattr` is missing from
the image, so the mitigation cannot silently disappear

## Why

The original immutable-hardening fix is directionally strong, but the
`chattr` path is intentionally best-effort and currently silent. That
makes the mitigation harder to trust and harder to debug because:

- a missing `chattr` binary looks the same as successful hardening
- partial `chattr +i` failures are suppressed with no visibility
- the image can regress and stop shipping `chattr` without CI catching
it

These changes make the mitigation easier to audit while staying
compatible with the current layered hardening model.

## Validation

- `bash -n scripts/nemoclaw-start.sh`
- `bash -n test/e2e-gateway-isolation.sh`
- `git diff --check`
- not run: `test/e2e-gateway-isolation.sh` (`docker` is not installed in
this environment)

## Relationship To #1137

This is a repost of the follow-up originally opened as
`latenighthackathon#1`, now targeted at `NVIDIA/NemoClaw` as
requested.

## Note

This replaces `#1467`, which GitHub auto-closed because the repository's
contributor open-PR limit was hit at the time.

Signed-off-by: 13ernkastel <LennonCMJ@live.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Enhanced startup process validation to ensure system integrity and
correct configuration
* Improved security hardening mechanisms with comprehensive logging and
graceful fallback handling when system features are unavailable

* **Tests**
* Updated end-to-end integration tests to verify system hardening
capabilities and feature availability

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Carlos Villela <cvillela@nvidia.com>
tranzmatt pushed a commit to tranzmatt/NemoClaw that referenced this pull request Apr 6, 2026
## Summary

This follow-up builds on NVIDIA#1137 and improves the observability around
immutable symlink hardening without changing the underlying
defense-in-depth approach.

## What Changed

- factors `.openclaw` symlink validation into a reusable helper so both
startup paths use the same validation logic
- adds explicit security logging when immutable hardening succeeds, is
partial, or is skipped because `chattr` is unavailable
- extends the gateway-isolation E2E to fail if `chattr` is missing from
the image, so the mitigation cannot silently disappear

## Why

The original immutable-hardening fix is directionally strong, but the
`chattr` path is intentionally best-effort and currently silent. That
makes the mitigation harder to trust and harder to debug because:

- a missing `chattr` binary looks the same as successful hardening
- partial `chattr +i` failures are suppressed with no visibility
- the image can regress and stop shipping `chattr` without CI catching
it

These changes make the mitigation easier to audit while staying
compatible with the current layered hardening model.

## Validation

- `bash -n scripts/nemoclaw-start.sh`
- `bash -n test/e2e-gateway-isolation.sh`
- `git diff --check`
- not run: `test/e2e-gateway-isolation.sh` (`docker` is not installed in
this environment)

## Relationship To NVIDIA#1137

This is a repost of the follow-up originally opened as
`latenighthackathon#1`, now targeted at `NVIDIA/NemoClaw` as
requested.

## Note

This replaces `NVIDIA#1467`, which GitHub auto-closed because the repository's
contributor open-PR limit was hit at the time.

Signed-off-by: 13ernkastel <LennonCMJ@live.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Enhanced startup process validation to ensure system integrity and
correct configuration
* Improved security hardening mechanisms with comprehensive logging and
graceful fallback handling when system features are unavailable

* **Tests**
* Updated end-to-end integration tests to verify system hardening
capabilities and feature availability

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Carlos Villela <cvillela@nvidia.com>
latenighthackathon pushed a commit that referenced this pull request Apr 8, 2026
…IA#1114) (NVIDIA#1305)

## Summary

Fixes the four issues reported in NVIDIA#1114 — EACCES permission errors and
missing gateway token when running inside the NemoClaw sandbox.

### Issue mapping

| # | Reported error | Fix |
|---|----------------|-----|
| 1 | `EACCES: open '/sandbox/.openclaw/openclaw.json.*.tmp'` |
`install_configure_guard` — intercepts `openclaw configure` with a clear
error and directs users to `nemoclaw onboard --resume` on the host |
| 2 | Same as #1 (different PID) | Same fix |
| 3 | `EACCES: mkdir '/sandbox/.openclaw/credentials'` | Already
resolved on main via NVIDIA#1519 (credentials symlink to `.openclaw-data/`) |
| 4 | No WhatsApp QR code | Consequence of #3, also resolved by NVIDIA#1519 |

### Root cause (issues 1 & 2)

OpenClaw's `configure` command performs atomic writes — it creates a
temp
file (`openclaw.json.PID.UUID.tmp`) in the same directory as the config.
Since `/sandbox/.openclaw/` is Landlock read-only at the kernel level,
file creation is rejected with EACCES. This is by design: the sandbox
config is intentionally immutable at runtime.

Rather than weakening Landlock (security regression), we intercept the
command in the sandbox shell and guide users to the correct host-side
workflow.

### Changes

**1. `install_configure_guard()`** — Writes a shell function wrapper to
`.bashrc`/`.profile` that intercepts `openclaw configure` and prints:
```
Error: 'openclaw configure' cannot modify config inside the sandbox.
The sandbox config is read-only (Landlock enforced) for security.

To change your configuration, exit the sandbox and run:
  nemoclaw onboard --resume

This rebuilds the sandbox with your updated settings.
```
All other `openclaw` subcommands pass through to the real binary.

**2. `export_gateway_token()`** — Reads `gateway.auth.token` from
`openclaw.json` and exports it as `OPENCLAW_GATEWAY_TOKEN`, so
interactive sessions (`openshell sandbox connect`) can authenticate
with the gateway. Persists to `.bashrc`/`.profile` using idempotent
marker blocks and cleans stale tokens on revocation.

**3. `_read_gateway_token()` helper** — Shared Python snippet used by
both `export_gateway_token` and `print_dashboard_urls` (deduplication,
uses `with open()` context manager).

All three are called in both root and non-root startup paths.

## Security properties preserved

- `/sandbox/.openclaw` remains root-owned, Landlock read-only
- `openclaw.json` remains chmod 444 (immutable)
- No new attack surface — token is read-only from existing config
- `command openclaw` bypass preserves all non-configure functionality

Fixes NVIDIA#1114

Signed-off-by: Dongni Yang <dongniy@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Signed-off-by: Dongni Yang <dongniy@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
gemini2026 pushed a commit to gemini2026/NemoClaw that referenced this pull request Apr 14, 2026
## Summary

This follow-up builds on NVIDIA#1137 and improves the observability around
immutable symlink hardening without changing the underlying
defense-in-depth approach.

## What Changed

- factors `.openclaw` symlink validation into a reusable helper so both
startup paths use the same validation logic
- adds explicit security logging when immutable hardening succeeds, is
partial, or is skipped because `chattr` is unavailable
- extends the gateway-isolation E2E to fail if `chattr` is missing from
the image, so the mitigation cannot silently disappear

## Why

The original immutable-hardening fix is directionally strong, but the
`chattr` path is intentionally best-effort and currently silent. That
makes the mitigation harder to trust and harder to debug because:

- a missing `chattr` binary looks the same as successful hardening
- partial `chattr +i` failures are suppressed with no visibility
- the image can regress and stop shipping `chattr` without CI catching
it

These changes make the mitigation easier to audit while staying
compatible with the current layered hardening model.

## Validation

- `bash -n scripts/nemoclaw-start.sh`
- `bash -n test/e2e-gateway-isolation.sh`
- `git diff --check`
- not run: `test/e2e-gateway-isolation.sh` (`docker` is not installed in
this environment)

## Relationship To NVIDIA#1137

This is a repost of the follow-up originally opened as
`latenighthackathon#1`, now targeted at `NVIDIA/NemoClaw` as
requested.

## Note

This replaces `NVIDIA#1467`, which GitHub auto-closed because the repository's
contributor open-PR limit was hit at the time.

Signed-off-by: 13ernkastel <LennonCMJ@live.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Enhanced startup process validation to ensure system integrity and
correct configuration
* Improved security hardening mechanisms with comprehensive logging and
graceful fallback handling when system features are unavailable

* **Tests**
* Updated end-to-end integration tests to verify system hardening
capabilities and feature availability

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Carlos Villela <cvillela@nvidia.com>
latenighthackathon pushed a commit that referenced this pull request Apr 29, 2026
… (NVIDIA#2404)

## Summary

NemoClaw's sandbox create stream only recognized the legacy Docker
builder format, so BuildKit output would not be treated as active build
progress once OpenShell emits it.

This adds BuildKit progress markers to the same parser path as the
existing legacy builder output. It keeps the current legacy behavior and
makes `#1 [internal] ...`, `#2 CACHED`, and `#3 DONE ...` visible as
build progress.

## Changes

- `src/lib/sandbox-create-stream.ts`: recognize BuildKit step and
completion lines while tracking the build phase.
- `src/lib/sandbox-create-stream.test.ts`: cover BuildKit progress
output and verify it is streamed to the user.

## Testing

- `npm run build:cli` passed
- `npm run typecheck:cli` passed
- `npm test -- src/lib/sandbox-create-stream.test.ts` passed
- `npm test` was also attempted. The full suite is not green on current
main in this environment; failures are in existing
installer/onboard/legacy-guard tests outside this change.

## Evidence it works

The new focused test feeds BuildKit-style output into
`streamSandboxCreate` and verifies that the lines are logged, collected
in output, and mark sandbox creation as having seen progress.

Fixes NVIDIA#2311

Signed-off-by: Deepak Jain <deepujain@gmail.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Improved detection and display of BuildKit and upload progress so
progress markers and completion states are recognized reliably.

* **Refactor**
* Centralized progress-detection logic for more consistent handling of
build and upload output.

* **Tests**
* Added a test ensuring BuildKit-formatted progress lines are captured,
included in output, and reported to the log callback.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Deepak Jain <deepujain@gmail.com>
Co-authored-by: Carlos Villela <cvillela@nvidia.com>
latenighthackathon pushed a commit that referenced this pull request May 15, 2026
…3456) (NVIDIA#3520)

> **Draft for visibility.** Issue-autopilot Stages 4-5 of NVIDIA#3456. Will
mark ready once batch self-review + CI complete.

## Summary

Closes the two remaining output threads in NVIDIA#3456 after the core
dead-loop fix already landed on `main` (via NVIDIA#3459, NVIDIA#3434, NVIDIA#3483). Full
sub-bug mapping in the [NVIDIA#3456 status
comment](NVIDIA#3456 (comment)).

- **Sub-bug #3** — `nemoclaw <name> destroy --yes` recovery hint
replaced with a registry-aware helper.
- **Sub-bug NVIDIA#4** — `Destroyed gateway 'nemoclaw' skipped`
self-contradictory wording replaced with `Gateway 'nemoclaw' already
removed or unreachable`.

## Acceptance criteria mapping

| Sub-bug | Resolution | Evidence |
|---|---|---|
| #1 dead loop | Already fixed on main (NVIDIA#3459) | out of scope |
| #2 firewall diagnostic | Already fixed on main (NVIDIA#3459) | out of scope
|
| **#3** literal `<name>` placeholder | **This PR** |
`src/lib/onboard/gpu-recovery.ts` + `onboard.ts:10387-10405` |
| **NVIDIA#4** misleading "skipped" wording | **This PR** |
`src/lib/actions/uninstall/run-plan.ts:210-228, 407-414` |
| NVIDIA#5 uninstall residuals | Already fixed on main (NVIDIA#3483) | out of scope
|

## Behavior matrix

`gpuPassthroughRecoveryLines(names)`:

| Input | Suggestion |
|---|---|
| `null` / `[]` | `nemoclaw uninstall && nemoclaw onboard --gpu` |
| one sandbox | `nemoclaw <name> destroy --yes --cleanup-gateway &&
nemoclaw onboard --gpu` |
| many sandboxes | each `destroy --yes`, only the last gets
`--cleanup-gateway` |

## Test plan

```
npm run typecheck:cli
npx vitest run src/lib/onboard/gpu-recovery.test.ts src/lib/actions/uninstall/run-plan.test.ts
```

22 tests pass (6 new + 16 existing).

## Notes for reviewers

- This is the work [NVIDIA#3464
attempted](NVIDIA#3464); that PR was
closed without merging after CodeRabbit asked for the `<name>`
placeholder to be forbidden in tests via negative assertion. This PR
adopts that refinement.
- `runOptional` extension is backwards-compatible — existing callers
without `onSkip` get the original wording.

Closes NVIDIA#3456 once merged.

---------

Signed-off-by: Charan Jagwani <charjags100@gmail.com>
Co-authored-by: Charan Jagwani <charjags100@gmail.com>
Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants