fix(security): enforce owner-only permissions on ~/.nemoclaw directory and config files#4054
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR hardens the security of NemoClaw configuration and adapter state directories by enforcing owner-only permissions (0o700 for directories, 0o600 for config files) across two separate initialization paths: adapter lifecycle and sandbox config sync. Implementations use best-effort tightening for existing directories and explicit permission modes for new creations, with comprehensive test coverage for both paths. ChangesPermission Security Hardening
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard.ts`:
- Line 997: The current fs.mkdirSync(stateDir, { recursive: true, mode: 0o700 })
call only sets permissions for newly created directories; if stateDir already
exists its mode won't be tightened. After the mkdirSync call, check the existing
directory (stateDir) and explicitly set its mode to 0o700 (e.g., via
fs.chmodSync or fs.promises.chmod) so permissions are enforced for both
newly-created and pre-existing ~/.nemoclaw/state; ensure errors are handled or
propagated consistent with the surrounding function.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 74808ab8-feb9-461a-9231-edf673131fff
📒 Files selected for processing (5)
src/lib/inference/local-adapter-lifecycle.test.tssrc/lib/inference/local-adapter-lifecycle.tssrc/lib/onboard.tssrc/lib/onboard/config-sync.test.tssrc/lib/onboard/config-sync.ts
b76a1ef to
cfb5302
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/lib/inference/local-adapter-lifecycle.test.ts (1)
134-153: Consider running the recommended E2E tests for onboard.ts changes.Since this PR modifies
onboard.ts(which contains core onboarding logic), the coding guidelines recommend running a suite of E2E tests to verify the full sandbox creation and configuration flow with the permission changes. As per coding guidelines, the recommended tests include:cloud-e2e,sandbox-operations-e2e,rebuild-openclaw-e2e, and others listed in the guidelines.To run selectively:
gh workflow run nightly-e2e.yaml --ref fix/nemoclaw-dir-permissions \ -f jobs=cloud-e2e,sandbox-operations-e2e,rebuild-openclaw-e2e,channels-stop-start-e2e,messaging-compatible-endpoint-e2e,bedrock-runtime-compatible-anthropic-e2e,hermes-discord-e2e,hermes-slack-e2e,openshell-gateway-upgrade-e2eThis would provide additional confidence that the permission enforcement doesn't break existing onboarding flows or sandbox operations.
As per coding guidelines, changes to
src/lib/onboard.tsshould be validated with E2E tests including cloud-e2e, sandbox-operations-e2e, rebuild-openclaw-e2e, channels-stop-start-e2e, messaging-compatible-endpoint-e2e, bedrock-runtime-compatible-anthropic-e2e, hermes-discord-e2e, hermes-slack-e2e, and openshell-gateway-upgrade-e2e.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/inference/local-adapter-lifecycle.test.ts` around lines 134 - 153, Run the recommended E2E suites to validate the onboarding changes in src/lib/onboard.ts: trigger the nightly-e2e.yaml workflow against your branch (e.g., ref fix/nemoclaw-dir-permissions) with the jobs cloud-e2e, sandbox-operations-e2e, rebuild-openclaw-e2e, channels-stop-start-e2e, messaging-compatible-endpoint-e2e, bedrock-runtime-compatible-anthropic-e2e, hermes-discord-e2e, hermes-slack-e2e, and openshell-gateway-upgrade-e2e (for example via gh workflow run nightly-e2e.yaml -f jobs=... --ref <branch>), inspect failures related to directory permission enforcement introduced around onboard.ts, and fix any failing flows (sandbox creation, configuration, or permission tightening) until all listed E2E jobs pass.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 997-1006: The added permission-tightening block increases file
size; extract it into a small helper like tightenDirPermissions(stateDir:
string) (or ensureSecureDir) in a new module under src/lib/onboard/ and replace
the inline block in the current file with a single call to that helper; the
helper should perform the mkdirSync(stateDir, { recursive: true, mode: 0o700 })
and the try { const stat = fs.statSync(stateDir); if ((stat.mode & 0o077) !== 0)
fs.chmodSync(stateDir, 0o700); } catch {} logic so the original onboard
entrypoint no longer grows in lines while preserving behavior (call sites: where
mkdirSync/statSync/chmodSync were used around stateDir).
---
Nitpick comments:
In `@src/lib/inference/local-adapter-lifecycle.test.ts`:
- Around line 134-153: Run the recommended E2E suites to validate the onboarding
changes in src/lib/onboard.ts: trigger the nightly-e2e.yaml workflow against
your branch (e.g., ref fix/nemoclaw-dir-permissions) with the jobs cloud-e2e,
sandbox-operations-e2e, rebuild-openclaw-e2e, channels-stop-start-e2e,
messaging-compatible-endpoint-e2e, bedrock-runtime-compatible-anthropic-e2e,
hermes-discord-e2e, hermes-slack-e2e, and openshell-gateway-upgrade-e2e (for
example via gh workflow run nightly-e2e.yaml -f jobs=... --ref <branch>),
inspect failures related to directory permission enforcement introduced around
onboard.ts, and fix any failing flows (sandbox creation, configuration, or
permission tightening) until all listed E2E jobs pass.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6eb90d3e-2d65-4ebc-95ce-1299881c21ea
📒 Files selected for processing (5)
src/lib/inference/local-adapter-lifecycle.test.tssrc/lib/inference/local-adapter-lifecycle.tssrc/lib/onboard.tssrc/lib/onboard/config-sync.test.tssrc/lib/onboard/config-sync.ts
✅ Files skipped from review due to trivial changes (1)
- src/lib/onboard/config-sync.test.ts
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/lib/onboard.ts (1)
36-36:⚠️ Potential issue | 🟠 Major | ⚡ Quick winOnboard entrypoint budget is still blocking merge.
CI is failing on
onboard-entrypoint-budgetdue to net +1 line insrc/lib/onboard.ts. Please make this change budget-neutral (same behavior, -1 net line) so the PR can merge.✂️ Minimal budget-neutral alternative
-const { ensureLocalAdapterStateDir }: typeof import("./inference/local-adapter-lifecycle") = require("./inference/local-adapter-lifecycle"); ... - ensureLocalAdapterStateDir(stateDir); + require("./inference/local-adapter-lifecycle").ensureLocalAdapterStateDir(stateDir);Also applies to: 998-998
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/onboard.ts` at line 36, The extra type-annotation form of the require added a line; replace the typed require with a plain destructuring require to keep identical runtime behavior but remove one line: change the current line that declares ensureLocalAdapterStateDir (the const { ensureLocalAdapterStateDir }: typeof import("./inference/local-adapter-lifecycle") = require(...)) to a single-line plain require (const { ensureLocalAdapterStateDir } = require("./inference/local-adapter-lifecycle");) so behavior is unchanged and the file is one line shorter.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@src/lib/onboard.ts`:
- Line 36: The extra type-annotation form of the require added a line; replace
the typed require with a plain destructuring require to keep identical runtime
behavior but remove one line: change the current line that declares
ensureLocalAdapterStateDir (the const { ensureLocalAdapterStateDir }: typeof
import("./inference/local-adapter-lifecycle") = require(...)) to a single-line
plain require (const { ensureLocalAdapterStateDir } =
require("./inference/local-adapter-lifecycle");) so behavior is unchanged and
the file is one line shorter.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 02d07edc-4523-46c5-9963-4cb13f1b710f
📒 Files selected for processing (1)
src/lib/onboard.ts
|
✨ Thanks for submitting this detailed PR about enforcing owner-only permissions on ~/.nemoclaw directory and config files. This proposes a way to improve security by avoiding world-readable credentials and config, and adds test cases to verify the new permissions. Related open issues: |
…y and config files Three code paths created ~/.nemoclaw/ or files within it without specifying restrictive permissions, falling back to umask-derived defaults (typically 755 for directories, 644 for files): 1. config-sync.ts — sandbox sync script used bare mkdir -p and cat without chmod 2. local-adapter-lifecycle.ts — ensureLocalAdapterStateDir() called mkdirSync without mode 3. onboard.ts — startModelRouter() created state dir without mode Fix: add mode 0o700 to all directory-creation calls and chmod 600 to config.json in the sandbox sync script. Also tighten existing directories retroactively in ensureLocalAdapterStateDir() (matching the pattern already used by config-io.ts ensureConfigDir()). Closes NVIDIA#4009 Signed-off-by: kagura-agent <kagura.agent.ai@gmail.com>
df6c395 to
6719542
Compare
## Summary - keep `~/.nemoclaw` directory hardening for user-owned config dirs - skip directory `chmod` when the config dir is owned by another UID, which preserves OpenClaw's root-owned `/sandbox/.nemoclaw` layout - keep `config.json` restricted to `0600` - add shell-level regression coverage for owned and non-owned config directories ## Root Cause PR #4054 changed the step-7 sandbox config sync script from writing `~/.nemoclaw/config.json` to unconditionally running `chmod 700 ~/.nemoclaw` first. In the OpenClaw sandbox image, `/sandbox/.nemoclaw` is intentionally root-owned with mode `1755`, while `/sandbox/.nemoclaw/config.json` is sandbox-owned and writable. The directory chmod fails under `set -euo pipefail`, so `openshell sandbox connect <sandbox>` exits 1 during: `[7/8] Setting up OpenClaw inside sandbox` This restores compatibility with that image contract without reverting the file-permission hardening from #4054. ## Evidence - Last known good public install: run `26413471038`, installed ref `8ed9a2c04c80627deed519a7c5ffe8f95ebb5ddd`, `cloud-onboard-e2e` reached `✓ OpenClaw gateway launched inside sandbox`. - First observed failing public install: run `26414889786`, installed ref `94fefa6fd08e33d48eede651e33770a0712da854`, `cloud-onboard-e2e` failed at `openshell sandbox connect e2e-cloud-onboard`. - The commit window contains #4054, #3980, and #3675; #4054 is the only PR that changed the config-sync script executed at the failing boundary. ## Validation - `npm ci --include=dev --ignore-scripts` - `npx vitest run src/lib/onboard/config-sync.test.ts --reporter=verbose` - `npx @biomejs/biome check src/lib/onboard/config-sync.ts src/lib/onboard/config-sync.test.ts` - `git diff --check` - `npm run build:cli` - `npm run typecheck:cli` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Refactor** * Improved NemoClaw configuration directory permission handling to be more robust in different user ownership scenarios. * Enhanced sandbox configuration script to conditionally apply permissions based on directory ownership. * **Tests** * Added comprehensive integration tests for configuration directory permission management and ownership verification. <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/4199?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Three code paths created
~/.nemoclaw/or files within it without specifying restrictive permissions, falling back to umask-derived defaults (typically1755for directories,644for files). This leaves credentials and config world-readable.Closes #4009
Changes
config-sync.tsmkdir -p -m 700,chmod 700dir,chmod 600config.jsonlocal-adapter-lifecycle.tsensureLocalAdapterStateDir(): addmode: 0o700+ retroactive chmod for existing lax directoriesonboard.tsstartModelRouter(): addmode: 0o700to state dir creationThe retroactive chmod in
ensureLocalAdapterStateDir()follows the same pattern already used byconfig-io.tsensureConfigDir()(line 153).Testing
local-adapter-lifecycle.test.ts: verifies new directory gets0o700and existing0o755directory gets tightened to0o700config-sync.test.ts: verifies shell script containsmkdir -p -m 700,chmod 700, andchmod 600npx vitest run)Signed-off-by: kagura-agent kagura-agent@users.noreply.github.com
Summary by CodeRabbit
Bug Fixes
Tests