Description
Description
nemoclaw shields down --timeout Nm spawns a detached node process
(scripts/lib/shields-timer.js) that is supposed to restore shields UP at the
deadline. If that process is killed before the deadline (host reboot, OOM, manual
SIGKILL), shields stay permanently DOWN: the deadline silently passes, no
shields_auto_restore audit entry is written, file perms remain mutable
(660 sandbox:sandbox), and shields status keeps reporting "DOWN
(temporarily unlocked)" with "Auto-lockdown in: 0m 0s". This violates fail-secure:
shields should snap back UP on any unexpected timer disruption, not stay open.
The stale ~/.nemoclaw/state/shields-timer-.json continues to reference
the dead PID. shields status does NOT self-heal — sandbox stays in permissive
policy until an operator manually runs shields up.
Environment
Device: x86_64 server (clean Ubuntu host, no k3s/gpu-operator competing)
OS: Ubuntu 24.04 (Linux 6.17.0-19-generic)
Architecture: x86_64
Node.js: v22.22.2
npm: 10.9.7
Docker: 29.1.3 (29.1.3-0ubuntu3~24.04.2)
OpenShell CLI: 0.0.36
NemoClaw: v0.0.35
OpenClaw: 2026.4.24
Steps to Reproduce
1. nemoclaw shields up # start from a known-locked baseline
2. nemoclaw shields down --timeout 2m --reason "test"
3. cat ~/.nemoclaw/state/shields-timer-.json
# note the recorded "pid": , e.g.:
# {"pid":441175,"sandboxName":"","snapshotPath":"...","restoreAt":"..."}
4. ps -p -o pid,cmd # confirm shields-timer.js is the child
5. kill -9 # simulate host reboot / OOM / crash
6. Wait past the 2-minute deadline (e.g. sleep 130)
7. nemoclaw shields status
8. From inside the sandbox (or via kubectl exec), check actual file perms:
stat -c "%a %U:%G %n" /sandbox/.openclaw /sandbox/.openclaw/openclaw.json /sandbox/.openclaw/.config-hash
9. tail -1 ~/.nemoclaw/state/shields-audit.jsonl
Expected Result
Shields auto-restore at the deadline regardless of timer-process liveness:
7. shields status shows "Shields: UP (lockdown active)"
8. /sandbox/.openclaw is 755 root:root; openclaw.json + .config-hash are 444 root:root
9. Audit log contains a new {"action":"shields_auto_restore",...} entry
shields status should self-heal on read: detect the deadline-passed condition
and either trigger restore inline OR report a distinct "expired/recovery pending"
state.
Actual Result
Shields stay permanently DOWN — fail-OPEN:
7. shields status reports
Shields: DOWN (temporarily unlocked)
Since: 2026-05-06T07:56:48.927Z
Auto-lockdown in: 0m 0s ← deadline already passed
Reason: test
Policy: permissive
8. Filesystem still in mutable-default state:
2770 sandbox:sandbox /sandbox/.openclaw
660 sandbox:sandbox /sandbox/.openclaw/openclaw.json
660 sandbox:sandbox /sandbox/.openclaw/.config-hash
9. Audit log: NO shields_auto_restore entry. Last entry is the original
shields_down event:
{"action":"shields_down","sandbox":"","timestamp":"...",
"timeout_seconds":120,"reason":"test","policy_applied":"permissive",...}
Stale ~/.nemoclaw/state/shields-timer-.json still records the dead PID.
ps -p returns no process. Subsequent shields status calls do not
self-heal. Sandbox stays in permissive policy until operator manually runs
nemoclaw shields up.
Real-world impact: any host reboot, runaway-process OOM kill, supervisor
restart, or accidental SIGKILL on the detached shields-timer.js process leaves
the sandbox unprotected indefinitely. Operators reasonably expect
"shields down --timeout Nm" to fail-secure (snap back UP) on host restart.
Suggested fixes
1. Make shields status validate the recorded timer PID. If shields-timer-.json
references a dead PID and the deadline has passed, trigger restore inline AND
re-report status, OR surface a distinct state (e.g. "DOWN (timer expired,
recovery pending)") so operators see it.
2. Add a host-side watchdog (systemd user unit / launchd job) that periodically
scans ~/.nemoclaw/state/shields-*.json for expired deadlines and runs
shields up.
3. shields up should clear stale shields-timer-.json on completion (and
verify pid_ liveness before treating it as live).
Bug Details
| Field |
Value |
| Priority |
Unprioritized |
| Action |
Dev - Open - To fix |
| Disposition |
Open issue |
| Module |
Machine Learning - NemoClaw |
| Keyword |
NemoClaw, NemoClaw_CLI&UX, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Security, NemoClaw-SWQA-RelBlckr-Recommended |
[NVB#6150114]
Description
Description
Environment Steps to Reproduce1. nemoclaw shields up # start from a known-locked baseline 2. nemoclaw shields down --timeout 2m --reason "test" 3. cat ~/.nemoclaw/state/shields-timer-.json # note the recorded "pid": , e.g.: # {"pid":441175,"sandboxName":"","snapshotPath":"...","restoreAt":"..."} 4. ps -p -o pid,cmd # confirm shields-timer.js is the child 5. kill -9 # simulate host reboot / OOM / crash 6. Wait past the 2-minute deadline (e.g. sleep 130) 7. nemoclaw shields status 8. From inside the sandbox (or via kubectl exec), check actual file perms: stat -c "%a %U:%G %n" /sandbox/.openclaw /sandbox/.openclaw/openclaw.json /sandbox/.openclaw/.config-hash 9. tail -1 ~/.nemoclaw/state/shields-audit.jsonlExpected ResultShields auto-restore at the deadline regardless of timer-process liveness: 7. shields status shows "Shields: UP (lockdown active)" 8. /sandbox/.openclaw is 755 root:root; openclaw.json + .config-hash are 444 root:root 9. Audit log contains a new {"action":"shields_auto_restore",...} entry shields status should self-heal on read: detect the deadline-passed condition and either trigger restore inline OR report a distinct "expired/recovery pending" state.Actual ResultShields stay permanently DOWN — fail-OPEN: 7. shields status reports Shields: DOWN (temporarily unlocked) Since: 2026-05-06T07:56:48.927Z Auto-lockdown in: 0m 0s ← deadline already passed Reason: test Policy: permissive 8. Filesystem still in mutable-default state: 2770 sandbox:sandbox /sandbox/.openclaw 660 sandbox:sandbox /sandbox/.openclaw/openclaw.json 660 sandbox:sandbox /sandbox/.openclaw/.config-hash 9. Audit log: NO shields_auto_restore entry. Last entry is the original shields_down event: {"action":"shields_down","sandbox":"","timestamp":"...", "timeout_seconds":120,"reason":"test","policy_applied":"permissive",...} Stale ~/.nemoclaw/state/shields-timer-.json still records the dead PID. ps -p returns no process. Subsequent shields status calls do not self-heal. Sandbox stays in permissive policy until operator manually runs nemoclaw shields up. Real-world impact: any host reboot, runaway-process OOM kill, supervisor restart, or accidental SIGKILL on the detached shields-timer.js process leaves the sandbox unprotected indefinitely. Operators reasonably expect "shields down --timeout Nm" to fail-secure (snap back UP) on host restart.Suggested fixesBug Details
[NVB#6150114]