`nemoclaw rebuild` aborts when files in `.openclaw-data` are root-owned

### Description

`nemoclaw <name> rebuild` aborts with:

```
Failed to back up sandbox state.
Failed: agents, extensions, workspace, skills, hooks, identity, devices, canvas, cron, memory, telegram, credentials
Aborting rebuild to prevent data loss.
```

Even though the `--name pre-…` snapshot path (via `nemoclaw <name> snapshot create`) succeeds and lists the same "Failed directories" as a non-fatal warning. The fatal-vs-non-fatal divergence between the two code paths is itself a bug, but the underlying cause is shared: the SSH-as-`sandbox`-user backup tar fails with `Cannot open: Permission denied` on individual files inside `${writableDir}/<state-dir>` that are owned by `root` and mode `0600`.

Verbatim verbose log (`NEMOCLAW_REBUILD_VERBOSE=1`):

```
[sandbox-state ...] Downloading via SSH+tar: tar -cf - -C /sandbox/.openclaw-data agents extensions workspace skills hooks identity devices canvas cron memory telegram credentials
[sandbox-state ...] SSH+tar download: exit=2, stdout=4546560 bytes,
  stderr=tar: agents/main/sessions/sessions.json: Cannot open: Permission denied
         tar: agents/main/agent/models.json: Cannot open: Permission denied
         tar: Exiting with failure status due to previous errors
[rebuild ...] Backup result: success=false, backed=, failed=agents,extensions,workspace,...
```

Tar exited 2 (errors-encountered, but tar still wrote 4.5 MB of data to stdout). The code at `src/lib/sandbox-state.ts:702` treats any non-zero tar exit as a complete backup failure and marks **all** existing state dirs as failed (not just the offending files). The rebuild guard at `src/nemoclaw.ts:2810` then aborts.

How files came to be root-owned in our case: yesterday's diagnostic session used `kubectl exec rtfm` (defaults to root in the `agent` container) to invoke `openclaw memory index`, `openclaw agent --message`, and a few file writes. Anything those root-as-sandbox-pod commands created landed at `root:root 0600`. The sandbox user later had read-permission to its own files but not those.


### Reproduction Steps

1. Onboard a sandbox: `nemoclaw onboard` with any provider.
2. From the *host*, exec into the running sandbox pod as root and have a NemoClaw-aware command write into the writable dir:
   ```
   docker exec openshell-cluster-nemoclaw kubectl -n openshell exec <sandbox> -- \
     sh -c 'echo "{}" > /sandbox/.openclaw-data/agents/main/sessions/sessions.json'
   ```
   The file ends up `root:root` 0644 (or 0600 depending on umask). For a more realistic repro, run any `openclaw` subcommand via kubectl-exec — e.g. `openclaw memory index` — which produces multiple root-owned files in `agents/main/`, `memory/`, and `workspace/`.
3. Run `nemoclaw <name> rebuild --yes`. Expected: rebuild proceeds, partial backup succeeds with a warning. Actual: rebuild aborts before the destroy step with the message above.


### Environment

- OS: Ubuntu 24.04 (Linux <host> 6.17.0-1014-nvidia aarch64)
- Hardware: NVIDIA GB10 (DGX Spark)
- Docker: Engine 27.x
- Node.js: v22.22.2
- NemoClaw: v0.0.29
- OpenShell (cluster): 0.0.36
- Sandbox image: `openshell/sandbox-from:1777485515` (built locally)
- Tar inside sandbox: GNU tar 1.35



### Debug Output

```shell
Output of `nemoclaw debug --quick --sandbox <sandbox>` captured 2026-04-29 18:47 UTC. Full 836-line capture archived at [debug-output-2026-04-29-1847.txt](https://github.com/user-attachments/files/27226637/debug-output-2026-04-29-1847.txt). Focused excerpt (the post-recovery healthy state of the sandbox; the *failed* rebuild's tar errors are reproduced under "Logs" below):


$ nemoclaw --version
nemoclaw v0.0.29

═══ System ═══

Linux <host> 6.17.0-1014-nvidia #14-Ubuntu SMP PREEMPT_DYNAMIC Tue Mar 17 19:01:40 UTC 2026 aarch64 aarch64 aarch64 GNU/Linux

═══ OpenShell ═══

Server:  https://127.0.0.1:8080  Status: Connected  Version: 0.0.36
Sandbox: <sandbox>  Namespace: openshell  Phase: Ready  Revision: 7

═══ Sandbox Filesystem Policy (excerpt) ═══

filesystem_policy:
  read_only:  [/usr, /lib, /proc, /dev/urandom, /app, /etc, /var/log, /sandbox, /sandbox/.openclaw]
  read_write: [/tmp, /dev/null, /sandbox/.openclaw-data, /sandbox/.nemoclaw]
  process:
    run_as_user: sandbox
    run_as_group: sandbox

═══ Onboard Session ═══

  "provider": "ollama-local",
  "model": "hermes3:8b",
  "endpointUrl": "http://host.openshell.internal:11435/v1",
  "policyPresets": ["npm","pypi","huggingface","brew","brave","local-inference"],
  "failure": null


The failure mode (this bug) occurs *during* `nemoclaw <sandbox> rebuild`, not at quiescent state — `nemoclaw debug --quick` shows a healthy sandbox because the rebuild was interrupted before the destroy step on a separate occasion when the bug was first triggered. The verbatim verbose-mode trace under "Logs" reproduces the actual failure.
```

### Logs

```shell
Verbose mode (`NEMOCLAW_REBUILD_VERBOSE=1`) trace excerpt — see Description.
```

### Checklist

- [x] I confirmed this bug is reproducible
- [x] I searched existing issues and this is not a duplicate

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`nemoclaw rebuild` aborts when files in `.openclaw-data` are root-owned #2727

Description

Reproduction Steps

Environment

Debug Output

Logs

Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

nemoclaw rebuild aborts when files in .openclaw-data are root-owned #2727

Description

Description

Reproduction Steps

Environment

Debug Output

Logs

Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`nemoclaw rebuild` aborts when files in `.openclaw-data` are root-owned #2727