Skip to content

[All Platforms][Sandbox] NemoClaw mutable sandbox breaks group-writable openclaw.json contract after openclaw doctor --fix (gateway cannot persist config) #4538

@PrachiShevate-nv

Description

@PrachiShevate-nv

Summary

In a newly onboarded NemoClaw sandbox in default mutable mode, the OpenClaw doctor command (openclaw doctor --fix) tightens $OPENCLAW_HOME/.openclaw and openclaw.json to 700 / 600. This breaks the documented group-writable contract for mutable sandboxes and prevents the gateway user from writing /sandbox/.openclaw/openclaw.json.

As a result, gateway-initiated config updates (e.g., control-UI toggles like "Enable Dreaming" and account toggles) would fail to persist, even though the sandbox is explicitly in the default mutable state.

Additionally, su -s /bin/sh gateway ... fails inside the sandbox with su: System error, so it is not possible to validate the gateway's write permissions as described in NemoClaw/OpenClaw guidance.


Environment

  • Host OS: Ubuntu (DGX Spark; user nvidia@spark-dadc)
  • NemoClaw CLI: nemoclaw (host) – version bundled with NemoClaw v0.0.44
  • NemoClaw version: v0.0.44 (per nemoclaw mutable-test status)
  • OpenClaw version inside sandbox: OpenClaw 2026.5.22 (a374c3a)
  • OpenShell: 0.0.44 (docker) from nemoclaw mutable-test status
  • Model: nvidia/nemotron-3-super-120b-a12b
  • Provider: nvidia-prod
  • Sandbox name: mutable-test

Reproduction steps

Preconditions

  1. NemoClaw CLI installed and working.
  2. Supported NVIDIA inference credential configured.
  3. No existing sandbox named mutable-test.

Steps

  1. On host, onboard a new sandbox:

    nemoclaw onboard --name mutable-test
    • Complete the wizard using the default permission mode / default mutable state.
    • Onboarding completes successfully; no "Permission denied" errors.
  2. Confirm shields / permissions state:

    nemoclaw mutable-test shields status
    nemoclaw mutable-test status

    Actual output (relevant parts):

    Shields: NOT CONFIGURED (default mutable state)
    Config is mutable. Run `nemoclaw <sandbox> shields up` to opt into lockdown.
    
    Permissions: not configured (default mutable state)
    
    • No mention of a "permission-bypass" mode.
    • This is interpreted as the default mutable / shields-down state.
  3. Connect to the sandbox:

    nemoclaw mutable-test connect

    Inside the sandbox shell:

  4. Check initial permissions on .openclaw and openclaw.json:

    stat -c '%a %U:%G' /sandbox/.openclaw/openclaw.json
    stat -c '%a %U:%G' /sandbox/.openclaw

    Actual:

    660 sandbox:sandbox
    2770 sandbox:sandbox
    

    (Test expectation was 664 and 2775 under NemoClaw mutable contract.)

  5. Run OpenClaw doctor with automatic fixes:

    openclaw doctor --fix; echo "doctor_exit:$?"

    Actual key output:

    • Doctor reports state directory and config file permissions as "too open" and then tightens them:

      ◇  State integrity
      ...
      - State directory permissions are too open ($OPENCLAW_HOME/.openclaw).
        Recommend chmod 700.
      - Config file is group/world readable
        ($OPENCLAW_HOME/.openclaw/openclaw.json). Recommend chmod 600.
      ...
      ◇  Doctor changes
      - Tightened permissions on $OPENCLAW_HOME/.openclaw to 700
      - Tightened permissions on $OPENCLAW_HOME/.openclaw/openclaw.json to 600
      
    • Exit code is 0:

      doctor_exit:0
      
  6. Verify sandbox user can still write to openclaw.json:

    echo 'smoke' >> /sandbox/.openclaw/openclaw.json; echo "write_exit:$?"

    Actual:

    write_exit:0
    

    (Sandbox user can still write; this part is OK.)

  7. Verify gateway user write (group-writable contract):

    su -s /bin/sh gateway -c "printf 'gateway-probe\n' >> /sandbox/.openclaw/openclaw.json"; echo "gateway_write_exit:$?"

    Actual:

    Password:
    su: System error
    gateway_write_exit:1
    
    • su itself fails with System error.
    • Even if su worked, openclaw.json is now 600, so gateway (same group, different UID) would not be able to write.
  8. Exit sandbox and verify skill install still works (for completeness):

    On host:

    rm -rf /tmp/nemoclaw-test-skill && mkdir -p /tmp/nemoclaw-test-skill
    printf '%s\n' '---' 'name: nemoclaw-test-skill' 'description: smoke test skill' '---' 'Smoke test skill.' > /tmp/nemoclaw-test-skill/SKILL.md
    
    nemoclaw mutable-test skill install /tmp/nemoclaw-test-skill; echo "skill_exit:$?"

    Actual:

    ✓ Skill 'nemoclaw-test-skill' installed
    skill_exit:0
    

    Connect to sandbox again:

    nemoclaw mutable-test connect
    test -f /sandbox/.openclaw/skills/nemoclaw-test-skill/SKILL.md; echo "installed_skill:$?"

    Actual:

    installed_skill:0
    

Expected behavior

For a NemoClaw sandbox in default mutable state:

  • /sandbox/.openclaw and /sandbox/.openclaw/openclaw.json should remain group-writable so that both the sandbox user and the gateway process (running as a gateway user in the sandbox group) can write config.

  • A typical contract for mutable mode (per documentation and test expectations) is:

    • /sandbox/.openclaw/openclaw.json: 664 sandbox:sandbox
    • /sandbox/.openclaw: 2775 sandbox:sandbox
      • g+w plus setgid so new files inherit group sandbox.
  • Running openclaw doctor --fix should not tighten permissions to 700/600 in a NemoClaw-managed mutable sandbox, because that prevents the gateway user from modifying openclaw.json and breaks UI-driven config persistence.

  • The gateway user inside the sandbox should be properly configured so that a probe like:

    su -s /bin/sh gateway -c "printf 'gateway-probe\n' >> /sandbox/.openclaw/openclaw.json"

    either:

    • succeeds (gateway_write_exit:0) in mutable mode, or
    • there is a documented alternative way to verify gateway write capabilities.

Actual behavior

  • openclaw doctor --fix reports .openclaw and openclaw.json as "too open" and tightens them to 700 / 600 even though:

    • The sandbox is in NemoClaw's default mutable state (Permissions: not configured (default mutable state)), and
    • NemoClaw policy explicitly treats /sandbox/.openclaw as a read-write area for sandbox + gateway.
  • openclaw.json ends up with permissions that prevent the gateway user from writing via the sandbox group.

  • su to the gateway user fails with su: System error, making it impossible to validate the gateway's write access using the expected probe.

  • Sandbox user writes still succeed, and skill installation under /sandbox/.openclaw/skills/... works as expected (skill_exit:0, installed_skill:0).


Why this is a problem

NemoClaw's mutable-mode story relies on the gateway being able to mutate openclaw.json and related state on behalf of the user, while the sandbox user can also write state under /sandbox/.openclaw.

When openclaw doctor --fix tightens the directory and config file to 700/600:

  • The gateway, running as a different UID (but same sandbox group), can no longer write openclaw.json.
  • Control-UI toggles and other gateway-driven config updates silently fail to persist.
  • The behavior contradicts the intended group-writable contract for the NemoClaw mutable default state.

The su: System error on gateway is an additional symptom suggesting that the gateway user entry / auth setup inside the sandbox image is not in a healthy or expected state.


Suggested fix / questions

  1. Adjust openclaw doctor behavior when running inside a NemoClaw sandbox in mutable mode so that it does not tighten $OPENCLAW_HOME/.openclaw and openclaw.json to 700/600, and instead preserves or restores group-writable permissions compatible with NemoClaw's gateway-write contract.

  2. Ensure the gateway user in the NemoClaw sandbox image is configured in a way that:

    • Allows probing gateway writeability (either via su or a documented alternative), and
    • Actually has write access to /sandbox/.openclaw/openclaw.json in mutable mode.
  3. Clarify in docs what the expected permissions are for /sandbox/.openclaw and openclaw.json under:

    • Mutable default state (shields down / not configured), and
    • Locked-down state (shields up), where tighter 700/600 may be intended.

Metadata

Metadata

Assignees

Labels

NV QABugs found by the NVIDIA QA Teamarea: sandboxOpenShell sandbox lifecycle, runtime, config, or recovery

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions