[Ubuntu 24.04][Security] shields-down timer process death leaves shields permanently DOWN (fail-open)

## Description

Description
<pre>nemoclaw <name> shields down --timeout Nm spawns a detached node process
(scripts/lib/shields-timer.js) that is supposed to restore shields UP at the
deadline. If that process is killed before the deadline (host reboot, OOM, manual
SIGKILL), shields stay permanently DOWN: the deadline silently passes, no
shields_auto_restore audit entry is written, file perms remain mutable
(660 sandbox:sandbox), and shields status keeps reporting "DOWN
(temporarily unlocked)" with "Auto-lockdown in: 0m 0s". This violates fail-secure:
shields should snap back UP on any unexpected timer disruption, not stay open.
The stale ~/.nemoclaw/state/shields-timer-<name>.json continues to reference
the dead PID. shields status does NOT self-heal — sandbox stays in permissive
policy until an operator manually runs shields up.
</pre>Environment

<pre>Device: x86_64 server (clean Ubuntu host, no k3s/gpu-operator competing)
OS: Ubuntu 24.04 (Linux 6.17.0-19-generic)
Architecture: x86_64
Node.js: v22.22.2
npm: 10.9.7
Docker: 29.1.3 (29.1.3-0ubuntu3~24.04.2)
OpenShell CLI: 0.0.36
NemoClaw: v0.0.35
OpenClaw: 2026.4.24
</pre>Steps to Reproduce

<pre>1. nemoclaw <name> shields up # start from a known-locked baseline
2. nemoclaw <name> shields down --timeout 2m --reason "test"
3. cat ~/.nemoclaw/state/shields-timer-<name>.json
 # note the recorded "pid": <PID>, e.g.:
 # {"pid":441175,"sandboxName":"<name>","snapshotPath":"...","restoreAt":"..."}
4. ps -p <PID> -o pid,cmd # confirm shields-timer.js is the child
5. kill -9 <PID> # simulate host reboot / OOM / crash
6. Wait past the 2-minute deadline (e.g. sleep 130)
7. nemoclaw <name> shields status
8. From inside the sandbox (or via kubectl exec), check actual file perms:
 stat -c "%a %U:%G %n" /sandbox/.openclaw /sandbox/.openclaw/openclaw.json /sandbox/.openclaw/.config-hash
9. tail -1 ~/.nemoclaw/state/shields-audit.jsonl
</pre>Expected Result

<pre>Shields auto-restore at the deadline regardless of timer-process liveness:
7. shields status shows "Shields: UP (lockdown active)"
8. /sandbox/.openclaw is 755 root:root; openclaw.json + .config-hash are 444 root:root
9. Audit log contains a new {"action":"shields_auto_restore",...} entry

shields status should self-heal on read: detect the deadline-passed condition
and either trigger restore inline OR report a distinct "expired/recovery pending"
state.
</pre>Actual Result

<pre>Shields stay permanently DOWN — fail-OPEN:
7. shields status reports
 Shields: DOWN (temporarily unlocked)
 Since: 2026-05-06T07:56:48.927Z
 Auto-lockdown in: 0m 0s ← deadline already passed
 Reason: test
 Policy: permissive
8. Filesystem still in mutable-default state:
 2770 sandbox:sandbox /sandbox/.openclaw
 660 sandbox:sandbox /sandbox/.openclaw/openclaw.json
 660 sandbox:sandbox /sandbox/.openclaw/.config-hash
9. Audit log: NO shields_auto_restore entry. Last entry is the original
 shields_down event:
 {"action":"shields_down","sandbox":"<name>","timestamp":"...",
 "timeout_seconds":120,"reason":"test","policy_applied":"permissive",...}

Stale ~/.nemoclaw/state/shields-timer-<name>.json still records the dead PID.
ps -p <old-PID> returns no process. Subsequent shields status calls do not
self-heal. Sandbox stays in permissive policy until operator manually runs
nemoclaw <name> shields up.

Real-world impact: any host reboot, runaway-process OOM kill, supervisor
restart, or accidental SIGKILL on the detached shields-timer.js process leaves
the sandbox unprotected indefinitely. Operators reasonably expect
"shields down --timeout Nm" to fail-secure (snap back UP) on host restart.
</pre>Suggested fixes

<pre>1. Make shields status validate the recorded timer PID. If shields-timer-<name>.json
 references a dead PID and the deadline has passed, trigger restore inline AND
 re-report status, OR surface a distinct state (e.g. "DOWN (timer expired,
 recovery pending)") so operators see it.
2. Add a host-side watchdog (systemd user unit / launchd job) that periodically
 scans ~/.nemoclaw/state/shields-*.json for expired deadlines and runs
 shields up.
3. shields up should clear stale shields-timer-<name>.json on completion (and
 verify pid_ liveness before treating it as live).

</pre>

## Bug Details

| Field | Value |
|-------|-------|
| Priority | Unprioritized |
| Action | Dev - Open - To fix |
| Disposition | Open issue |
| Module | Machine Learning - NemoClaw |
| Keyword | NemoClaw, NemoClaw_CLI&UX, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Security, NemoClaw-SWQA-RelBlckr-Recommended |

---
[NVB#6150114]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ubuntu 24.04][Security] shields-down timer process death leaves shields permanently DOWN (fail-open) #3112

Description

Bug Details

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Field	Value
Priority	Unprioritized
Action	Dev - Open - To fix
Disposition	Open issue
Module	Machine Learning - NemoClaw
Keyword	NemoClaw, NemoClaw_CLI&UX, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Security, NemoClaw-SWQA-RelBlckr-Recommended

[Ubuntu 24.04][Security] shields-down timer process death leaves shields permanently DOWN (fail-open) #3112

Description

Description

Bug Details

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions