Skip to content

fix(sandbox): harden startup log creation#3888

Open
ChunkyMonkey11 wants to merge 15 commits into
NVIDIA:mainfrom
ChunkyMonkey11:security-start-log-no-follow
Open

fix(sandbox): harden startup log creation#3888
ChunkyMonkey11 wants to merge 15 commits into
NVIDIA:mainfrom
ChunkyMonkey11:security-start-log-no-follow

Conversation

@ChunkyMonkey11

@ChunkyMonkey11 ChunkyMonkey11 commented May 20, 2026

Copy link
Copy Markdown
Contributor

Summary

Harden startup log handling for the OpenClaw and Hermes entrypoints. These logs live under /tmp, which is writable from inside the sandbox, so startup should not trust a path that already exists there.

The entrypoints now replace pre-existing log paths with freshly prepared regular files before root, gateway, or sandbox-owned processes write diagnostics. That prevents a sandbox user from planting a symlink that redirects log truncation, writes, chmod, or chown onto another trusted in-container path.

Changes

  • Prepare /tmp/nemoclaw-start.log with a fresh restricted file before early diagnostics are captured.
  • Reuse the same no-follow preparation for OpenClaw service logs: /tmp/gateway.log and /tmp/auto-pair.log.
  • Tighten the Hermes restricted-log helper so symlinks, including symlinks to directories, are unlinked before replacement.
  • Keep real directories protected: startup refuses to replace an actual directory at the log path.
  • Add regressions that plant symlinks at the affected log paths and verify the symlink targets are not modified.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Passed locally:

  • shellcheck scripts/nemoclaw-start.sh agents/hermes/start.sh
  • npx vitest run test/nemoclaw-start.test.ts -t "replaces a pre-existing startup log symlink"
  • npx vitest run test/nemoclaw-start.test.ts -t "replaces symlinked service logs"
  • npx vitest run test/sandbox-init.test.ts -t "hermes restricted log setup replaces symlinks to directories"
  • npx vitest run test/nemoclaw-start.test.ts test/sandbox-init.test.ts
  • npm run source-shape:check
  • git diff --check origin/security-start-log-no-follow

A normal push still did not pass in this checkout. The pre-push hook failed before this patch's tests on existing TypeScript/dist drift and then the full CLI test hook reported unrelated failures in test/generate-openclaw-config.test.ts numeric env-var handling plus src/lib/sandbox-base-image.test.ts git fixture cases. The affected log-hardening tests above pass, and the branch was pushed with --no-verify after recording those failures.


Signed-off-by: Revant Patel revant.h.patel@gmail.com

Summary by CodeRabbit

  • Bug Fixes / Security

    • Hardened startup log provisioning to replace symlinked targets with regular files, refuse non-symlink directories, remove/replace targets atomically, and fail safely on errors.
    • Enforces explicit file modes (600 for sensitive logs, 644 where appropriate) and applies ownership when running as root.
    • Ensures runtime shell environment shim is backfilled idempotently and runtime shell files are validated/created.
  • Tests

    • Added tests verifying symlink replacement, permission/ownership enforcement, idempotent shim installation, and correct behavior across root/non-root paths, including symlinks to directories.

Review Change Stack

Signed-off-by: Test User <test@example.com>
@copy-pr-bot

copy-pr-bot Bot commented May 20, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds atomic, defensive log provisioning helpers and hardens Hermes and Nemoclaw startup paths to reject unsafe targets, replace symlinked logs with regular files with explicit modes/ownership, and backfills an idempotent runtime shell env shim; tests verify symlink replacement, permissions, and shim idempotency.

Changes

Startup log file hardening

Layer / File(s) Summary
Secure log initialization helper
scripts/nemoclaw-start.sh, agents/hermes/start.sh
Adds prepare_start_log() and hardens prepare_restricted_log() to create a temp file adjacent to the target, truncate it, optionally chown, set explicit chmod, refuse to overwrite a pre-existing non-symlink directory, unlink existing destination, and replace the target via mv -f. Replaces prior touch/chown/chmod sequences for /tmp/nemoclaw-start.log, /tmp/gateway.log, and /tmp/auto-pair.log in non-root and root branches.
Test coverage for symlink replacement
test/nemoclaw-start.test.ts, test/sandbox-init.test.ts
Adds helpers that extract the early-start and restricted-log helpers, pre-seed _START_LOG, gateway.log, and auto-pair.log as symlinks (including symlink-to-directory), inject the helpers into wrapper scripts, run startup snippets, and assert symlinks are replaced by regular files with expected 0600/0644 modes while capturing trusted target contents.

Runtime shell env shim and sandbox-home refactor

Layer / File(s) Summary
Sandbox-home resolution and shim install
agents/hermes/start.sh
Move sandbox-home detection into resolve_sandbox_home(), add _RUNTIME_SHELL_ENV_SHIM, introduce ensure_runtime_shell_env_shim() which validates/creates non-symlink .bashrc/.profile and backfills a sourced shim via an atomic write helper; delay runtime/proxy initialization until after privilege branching and call write_runtime_shell_env, ensure_runtime_shell_env_shim, and lock_rc_files in both startup paths.
Tests for shim backfill idempotency
test/hermes-start.test.ts
Extract ensure_runtime_shell_env_shim() and bootstrapping helpers into tests, update emit_sandbox_sourced_file to atomic-write via .tmp + mv, and add a test asserting the shim line is added exactly once to both .bashrc and .profile and that both files end up with mode 444.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

bug, NV QA, VDR

Suggested reviewers

  • ericksoa

Poem

🐰 I hopped and made a temp file neat,
Swapped sly links for files so discreet,
Wrote shims with gentle, atomic care,
Locked permissions so nothing's bare,
Logs and rc files now sleep sound and sweet.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.76% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(sandbox): harden startup log creation' accurately and concisely summarizes the main objective of the pull request: hardening the creation of startup log files to prevent symlink-based attacks.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
scripts/nemoclaw-start.sh (1)

48-87: Please run the entrypoint E2Es on this branch before merge.

This hardening sits on the sandbox boot path, and the unit test here only exercises an extracted snippet rather than the full Landlock/non-root/process-lifecycle path. I’d run sandbox-survival-e2e, sandbox-operations-e2e, and cloud-e2e for this change.

As per coding guidelines, scripts/nemoclaw-start.sh: This file is a sandbox entrypoint script. Changes affect every sandbox boot and are invisible to unit tests (Landlock, non-root execution, process lifecycle). and E2E test recommendation: sandbox-survival-e2e, sandbox-operations-e2e, cloud-e2e.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/nemoclaw-start.sh` around lines 48 - 87, The change in
scripts/nemoclaw-start.sh affects sandbox boot behavior (see prepare_start_log
function and its invocation at the bottom) and must be validated with full
entrypoint E2E runs before merging; please run the recommended end-to-end suites
sandbox-survival-e2e, sandbox-operations-e2e, and cloud-e2e against this branch
(exercising Landlock/non-root/process-lifecycle paths) and post the passing
results or any fixes, referencing prepare_start_log and the two calls that
invoke it for root and non-root paths.
test/nemoclaw-start.test.ts (1)

120-151: ⚡ Quick win

Add a regression for the real-directory refusal path.

The new helper also refuses to replace _START_LOG when it is an actual directory, but this suite only locks in the symlink-replacement branch. A small companion case that pre-creates logPath as a directory and asserts a non-zero exit plus an unchanged directory would cover the other security contract introduced here.

🧪 Suggested companion test
+  it("refuses to replace a real directory at the startup log path", () => {
+    const src = fs.readFileSync(START_SCRIPT, "utf-8");
+    const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), "nemoclaw-start-log-dir-"));
+    const logPath = path.join(tmpDir, "nemoclaw-start.log");
+    fs.mkdirSync(logPath);
+
+    try {
+      const block = earlyStartLogBlock(src).replace(
+        '_START_LOG="/tmp/nemoclaw-start.log"',
+        `_START_LOG=${JSON.stringify(logPath)}`,
+      );
+      const result = spawnSync("bash", ["-c", ["set -euo pipefail", block].join("\n")], {
+        encoding: "utf-8",
+        timeout: 5000,
+      });
+
+      expect(result.status).not.toBe(0);
+      expect(fs.statSync(logPath).isDirectory()).toBe(true);
+    } finally {
+      fs.rmSync(tmpDir, { recursive: true, force: true });
+    }
+  });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/nemoclaw-start.test.ts` around lines 120 - 151, Add a companion test to
cover the real-directory refusal path: create a new it block alongside "replaces
a pre-existing startup log symlink..." that pre-creates logPath as a directory
(instead of a symlink), writes trustedPath, builds the script using
earlyStartLogBlock with _START_LOG set to JSON.stringify(logPath), spawns bash
as in the existing test, and assert the process exits non-zero and that the
pre-existing directory at logPath still exists and is unchanged (e.g.,
isDirectory and trustedPath contents unchanged). Ensure you reference
earlyStartLogBlock, logPath and trustedPath when locating the relevant setup and
assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 48-87: The change in scripts/nemoclaw-start.sh affects sandbox
boot behavior (see prepare_start_log function and its invocation at the bottom)
and must be validated with full entrypoint E2E runs before merging; please run
the recommended end-to-end suites sandbox-survival-e2e, sandbox-operations-e2e,
and cloud-e2e against this branch (exercising
Landlock/non-root/process-lifecycle paths) and post the passing results or any
fixes, referencing prepare_start_log and the two calls that invoke it for root
and non-root paths.

In `@test/nemoclaw-start.test.ts`:
- Around line 120-151: Add a companion test to cover the real-directory refusal
path: create a new it block alongside "replaces a pre-existing startup log
symlink..." that pre-creates logPath as a directory (instead of a symlink),
writes trustedPath, builds the script using earlyStartLogBlock with _START_LOG
set to JSON.stringify(logPath), spawns bash as in the existing test, and assert
the process exits non-zero and that the pre-existing directory at logPath still
exists and is unchanged (e.g., isDirectory and trustedPath contents unchanged).
Ensure you reference earlyStartLogBlock, logPath and trustedPath when locating
the relevant setup and assertions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a3b5c423-f61d-4003-8f70-846d54f14892

📥 Commits

Reviewing files that changed from the base of the PR and between 11b1937 and 86787ac.

📒 Files selected for processing (2)
  • scripts/nemoclaw-start.sh
  • test/nemoclaw-start.test.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/nemoclaw-start.test.ts`:
- Around line 1909-1915: The test currently writes to gatewayLog and autoPairLog
before asserting symlink behavior which can mask failures; after the spawnSync()
that produces result, immediately capture and assert the pre-write state of
gatewayLog and autoPairLog (existence and whether they are symlinks and their
mode) using fs.existsSync and fs.lstatSync or fs.statSync on gatewayLog and
autoPairLog (and verify they point to gatewayTarget/autoPairTarget) before
performing the fs.writeFileSync() calls, then perform the writes only to verify
the trusted targets (gatewayTarget, autoPairTarget) remain unchanged and finally
re-check gatewayLogIsSymlink/autoPairLogIsSymlink to ensure the harness didn’t
inadvertently replace symlinks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3a12a31f-c307-4985-9d16-e6fbfec30398

📥 Commits

Reviewing files that changed from the base of the PR and between daf1222 and 1c37ef4.

📒 Files selected for processing (4)
  • agents/hermes/start.sh
  • scripts/nemoclaw-start.sh
  • test/nemoclaw-start.test.ts
  • test/sandbox-init.test.ts

Comment thread test/nemoclaw-start.test.ts
@wscurran wscurran added fix security Potential vulnerability, unsafe behavior, or access risk labels May 20, 2026
@wscurran

Copy link
Copy Markdown
Contributor

✨ Thanks for submitting this detailed PR about hardening startup log handling for the OpenClaw and Hermes entrypoints. This PR proposes a way to prevent a sandbox user from planting a symlink that redirects log truncation, writes, chmod, or chown onto another trusted in-container path.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
agents/hermes/start.sh (1)

230-287: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Backfill the rc shim before locking Hermes rc files.

This now writes /tmp/nemoclaw-proxy-env.sh and immediately locks $_SANDBOX_HOME, but it never does the OpenClaw-style shim backfill that makes /sandbox/.bashrc / .profile source that file. On stale Hermes images, connect shells will miss the proxy/guard exports entirely. _SANDBOX_HOME is also resolved from the pre-normalized HOME, even though the non-root path later forces HOME=/sandbox, so the wrong rc directory can get locked in that posture.

♻️ Proposed fix
 if [ "$(id -u)" -eq 0 ]; then
   _SANDBOX_HOME=$(getent passwd sandbox 2>/dev/null | cut -d: -f6)
   _SANDBOX_HOME="${_SANDBOX_HOME:-/sandbox}"
 else
-  _SANDBOX_HOME="${HOME:-/sandbox}"
+  _SANDBOX_HOME="/sandbox"
 fi
@@
 _PROXY_ENV_FILE="/tmp/nemoclaw-proxy-env.sh"
+_RUNTIME_SHELL_ENV_SHIM="[ -f ${_PROXY_ENV_FILE} ] && . ${_PROXY_ENV_FILE}"
+
+ensure_runtime_shell_env_shim() {
+  local rc_file
+  for rc_file in "${_SANDBOX_HOME}/.bashrc" "${_SANDBOX_HOME}/.profile"; do
+    [ -L "$rc_file" ] && continue
+    if [ -f "$rc_file" ] && grep -qxF "$_RUNTIME_SHELL_ENV_SHIM" "$rc_file" 2>/dev/null; then
+      continue
+    fi
+    if [ -e "$rc_file" ]; then
+      printf '\n%s\n%s\n' '# Source runtime proxy config' "$_RUNTIME_SHELL_ENV_SHIM" >>"$rc_file"
+    else
+      printf '%s\n%s\n' '# Source runtime proxy config' "$_RUNTIME_SHELL_ENV_SHIM" >"$rc_file"
+    fi
+  done
+}
+
 write_runtime_shell_env
+ensure_runtime_shell_env_shim
 lock_rc_files "$_SANDBOX_HOME"

As per coding guidelines, agents/hermes/** recommends rebuild-hermes-stale-base-e2e for stale cached base image paths.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agents/hermes/start.sh` around lines 230 - 287, Resolve _SANDBOX_HOME after
the non-root HOME normalization and backfill the sandbox rc shim before locking
files: move the _SANDBOX_HOME calculation so it uses the final HOME value (so
non-root paths get /sandbox), then after write_runtime_shell_env() add a step
that injects a small shim into the sandbox rc files (e.g., create or ensure a
line sourcing "$_PROXY_ENV_FILE" in the target .bashrc/.profile using the same
emit_sandbox_sourced_file mechanism) and only then call lock_rc_files
"$_SANDBOX_HOME"; reference write_runtime_shell_env, _PROXY_ENV_FILE,
emit_sandbox_sourced_file and lock_rc_files to locate where to add/move these
changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@agents/hermes/start.sh`:
- Around line 230-287: Resolve _SANDBOX_HOME after the non-root HOME
normalization and backfill the sandbox rc shim before locking files: move the
_SANDBOX_HOME calculation so it uses the final HOME value (so non-root paths get
/sandbox), then after write_runtime_shell_env() add a step that injects a small
shim into the sandbox rc files (e.g., create or ensure a line sourcing
"$_PROXY_ENV_FILE" in the target .bashrc/.profile using the same
emit_sandbox_sourced_file mechanism) and only then call lock_rc_files
"$_SANDBOX_HOME"; reference write_runtime_shell_env, _PROXY_ENV_FILE,
emit_sandbox_sourced_file and lock_rc_files to locate where to add/move these
changes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f3c4946d-4d09-4376-aaef-32873ea6dc1c

📥 Commits

Reviewing files that changed from the base of the PR and between 1c37ef4 and 1556f01.

📒 Files selected for processing (3)
  • agents/hermes/start.sh
  • scripts/nemoclaw-start.sh
  • test/nemoclaw-start.test.ts

@ChunkyMonkey11

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@ChunkyMonkey11

Copy link
Copy Markdown
Contributor Author

@cv I checked current main, and this still looks like a live issue.

/tmp is sandbox-writable, and scripts/nemoclaw-start.sh still initializes /tmp/nemoclaw-start.log, /tmp/gateway.log, and /tmp/auto-pair.log with direct truncate/touch/chown/chmod flows that can act through a pre-planted symlink. agents/hermes/start.sh also still lacks the explicit no-follow replacement guard for the restricted log helper, especially for symlinks that resolve to directories.

This PR's current head replaces those paths with freshly prepared regular files and adds regression coverage for the symlink cases. Could you take a look and provide feedback/review as well?

@wscurran wscurran added area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression and removed priority: high labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression security Potential vulnerability, unsafe behavior, or access risk

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants