Description
nemoclaw <name> rebuild aborts with:
Failed to back up sandbox state.
Failed: agents, extensions, workspace, skills, hooks, identity, devices, canvas, cron, memory, telegram, credentials
Aborting rebuild to prevent data loss.
Even though the --name pre-… snapshot path (via nemoclaw <name> snapshot create) succeeds and lists the same "Failed directories" as a non-fatal warning. The fatal-vs-non-fatal divergence between the two code paths is itself a bug, but the underlying cause is shared: the SSH-as-sandbox-user backup tar fails with Cannot open: Permission denied on individual files inside ${writableDir}/<state-dir> that are owned by root and mode 0600.
Verbatim verbose log (NEMOCLAW_REBUILD_VERBOSE=1):
[sandbox-state ...] Downloading via SSH+tar: tar -cf - -C /sandbox/.openclaw-data agents extensions workspace skills hooks identity devices canvas cron memory telegram credentials
[sandbox-state ...] SSH+tar download: exit=2, stdout=4546560 bytes,
stderr=tar: agents/main/sessions/sessions.json: Cannot open: Permission denied
tar: agents/main/agent/models.json: Cannot open: Permission denied
tar: Exiting with failure status due to previous errors
[rebuild ...] Backup result: success=false, backed=, failed=agents,extensions,workspace,...
Tar exited 2 (errors-encountered, but tar still wrote 4.5 MB of data to stdout). The code at src/lib/sandbox-state.ts:702 treats any non-zero tar exit as a complete backup failure and marks all existing state dirs as failed (not just the offending files). The rebuild guard at src/nemoclaw.ts:2810 then aborts.
How files came to be root-owned in our case: yesterday's diagnostic session used kubectl exec rtfm (defaults to root in the agent container) to invoke openclaw memory index, openclaw agent --message, and a few file writes. Anything those root-as-sandbox-pod commands created landed at root:root 0600. The sandbox user later had read-permission to its own files but not those.
Reproduction Steps
- Onboard a sandbox:
nemoclaw onboard with any provider.
- From the host, exec into the running sandbox pod as root and have a NemoClaw-aware command write into the writable dir:
docker exec openshell-cluster-nemoclaw kubectl -n openshell exec <sandbox> -- \
sh -c 'echo "{}" > /sandbox/.openclaw-data/agents/main/sessions/sessions.json'
The file ends up root:root 0644 (or 0600 depending on umask). For a more realistic repro, run any openclaw subcommand via kubectl-exec — e.g. openclaw memory index — which produces multiple root-owned files in agents/main/, memory/, and workspace/.
- Run
nemoclaw <name> rebuild --yes. Expected: rebuild proceeds, partial backup succeeds with a warning. Actual: rebuild aborts before the destroy step with the message above.
Environment
- OS: Ubuntu 24.04 (Linux 6.17.0-1014-nvidia aarch64)
- Hardware: NVIDIA GB10 (DGX Spark)
- Docker: Engine 27.x
- Node.js: v22.22.2
- NemoClaw: v0.0.29
- OpenShell (cluster): 0.0.36
- Sandbox image:
openshell/sandbox-from:1777485515 (built locally)
- Tar inside sandbox: GNU tar 1.35
Debug Output
Output of `nemoclaw debug --quick --sandbox <sandbox>` captured 2026-04-29 18:47 UTC. Full 836-line capture archived at [debug-output-2026-04-29-1847.txt](https://github.com/user-attachments/files/27226637/debug-output-2026-04-29-1847.txt). Focused excerpt (the post-recovery healthy state of the sandbox; the *failed* rebuild's tar errors are reproduced under "Logs" below):
$ nemoclaw --version
nemoclaw v0.0.29
═══ System ═══
Linux <host> 6.17.0-1014-nvidia #14-Ubuntu SMP PREEMPT_DYNAMIC Tue Mar 17 19:01:40 UTC 2026 aarch64 aarch64 aarch64 GNU/Linux
═══ OpenShell ═══
Server: https://127.0.0.1:8080 Status: Connected Version: 0.0.36
Sandbox: <sandbox> Namespace: openshell Phase: Ready Revision: 7
═══ Sandbox Filesystem Policy (excerpt) ═══
filesystem_policy:
read_only: [/usr, /lib, /proc, /dev/urandom, /app, /etc, /var/log, /sandbox, /sandbox/.openclaw]
read_write: [/tmp, /dev/null, /sandbox/.openclaw-data, /sandbox/.nemoclaw]
process:
run_as_user: sandbox
run_as_group: sandbox
═══ Onboard Session ═══
"provider": "ollama-local",
"model": "hermes3:8b",
"endpointUrl": "http://host.openshell.internal:11435/v1",
"policyPresets": ["npm","pypi","huggingface","brew","brave","local-inference"],
"failure": null
The failure mode (this bug) occurs *during* `nemoclaw <sandbox> rebuild`, not at quiescent state — `nemoclaw debug --quick` shows a healthy sandbox because the rebuild was interrupted before the destroy step on a separate occasion when the bug was first triggered. The verbatim verbose-mode trace under "Logs" reproduces the actual failure.
Logs
Verbose mode (`NEMOCLAW_REBUILD_VERBOSE=1`) trace excerpt — see Description.
Checklist
Description
nemoclaw <name> rebuildaborts with:Even though the
--name pre-…snapshot path (vianemoclaw <name> snapshot create) succeeds and lists the same "Failed directories" as a non-fatal warning. The fatal-vs-non-fatal divergence between the two code paths is itself a bug, but the underlying cause is shared: the SSH-as-sandbox-user backup tar fails withCannot open: Permission deniedon individual files inside${writableDir}/<state-dir>that are owned byrootand mode0600.Verbatim verbose log (
NEMOCLAW_REBUILD_VERBOSE=1):Tar exited 2 (errors-encountered, but tar still wrote 4.5 MB of data to stdout). The code at
src/lib/sandbox-state.ts:702treats any non-zero tar exit as a complete backup failure and marks all existing state dirs as failed (not just the offending files). The rebuild guard atsrc/nemoclaw.ts:2810then aborts.How files came to be root-owned in our case: yesterday's diagnostic session used
kubectl exec rtfm(defaults to root in theagentcontainer) to invokeopenclaw memory index,openclaw agent --message, and a few file writes. Anything those root-as-sandbox-pod commands created landed atroot:root 0600. The sandbox user later had read-permission to its own files but not those.Reproduction Steps
nemoclaw onboardwith any provider.root:root0644 (or 0600 depending on umask). For a more realistic repro, run anyopenclawsubcommand via kubectl-exec — e.g.openclaw memory index— which produces multiple root-owned files inagents/main/,memory/, andworkspace/.nemoclaw <name> rebuild --yes. Expected: rebuild proceeds, partial backup succeeds with a warning. Actual: rebuild aborts before the destroy step with the message above.Environment
openshell/sandbox-from:1777485515(built locally)Debug Output
Logs
Verbose mode (`NEMOCLAW_REBUILD_VERBOSE=1`) trace excerpt — see Description.Checklist