Skip to content

[Ubuntu 24.04][Security] shields-down timer process death leaves shields permanently DOWN (fail-open) #3112

@zNeill

Description

@zNeill

Description

Description

nemoclaw  shields down --timeout Nm spawns a detached node process
(scripts/lib/shields-timer.js) that is supposed to restore shields UP at the
deadline. If that process is killed before the deadline (host reboot, OOM, manual
SIGKILL), shields stay permanently DOWN: the deadline silently passes, no
shields_auto_restore audit entry is written, file perms remain mutable
(660 sandbox:sandbox), and shields status keeps reporting "DOWN
(temporarily unlocked)" with "Auto-lockdown in: 0m 0s". This violates fail-secure:
shields should snap back UP on any unexpected timer disruption, not stay open.
The stale ~/.nemoclaw/state/shields-timer-.json continues to reference
the dead PID. shields status does NOT self-heal — sandbox stays in permissive
policy until an operator manually runs shields up.
Environment
Device:        x86_64 server (clean Ubuntu host, no k3s/gpu-operator competing)
OS:            Ubuntu 24.04 (Linux 6.17.0-19-generic)
Architecture:  x86_64
Node.js:       v22.22.2
npm:           10.9.7
Docker:        29.1.3 (29.1.3-0ubuntu3~24.04.2)
OpenShell CLI: 0.0.36
NemoClaw:      v0.0.35
OpenClaw:      2026.4.24
Steps to Reproduce
1. nemoclaw  shields up                # start from a known-locked baseline
2. nemoclaw  shields down --timeout 2m --reason "test"
3. cat ~/.nemoclaw/state/shields-timer-.json
   # note the recorded "pid": , e.g.:
   # {"pid":441175,"sandboxName":"","snapshotPath":"...","restoreAt":"..."}
4. ps -p  -o pid,cmd                    # confirm shields-timer.js is the child
5. kill -9                              # simulate host reboot / OOM / crash
6. Wait past the 2-minute deadline (e.g. sleep 130)
7. nemoclaw  shields status
8. From inside the sandbox (or via kubectl exec), check actual file perms:
   stat -c "%a %U:%G %n" /sandbox/.openclaw /sandbox/.openclaw/openclaw.json /sandbox/.openclaw/.config-hash
9. tail -1 ~/.nemoclaw/state/shields-audit.jsonl
Expected Result
Shields auto-restore at the deadline regardless of timer-process liveness:
7. shields status shows "Shields: UP (lockdown active)"
8. /sandbox/.openclaw is 755 root:root; openclaw.json + .config-hash are 444 root:root
9. Audit log contains a new {"action":"shields_auto_restore",...} entry

shields status should self-heal on read: detect the deadline-passed condition
and either trigger restore inline OR report a distinct "expired/recovery pending"
state.
Actual Result
Shields stay permanently DOWN — fail-OPEN:
7. shields status reports
     Shields: DOWN (temporarily unlocked)
     Since:   2026-05-06T07:56:48.927Z
     Auto-lockdown in: 0m 0s          ← deadline already passed
     Reason:  test
     Policy:  permissive
8. Filesystem still in mutable-default state:
     2770 sandbox:sandbox /sandbox/.openclaw
     660 sandbox:sandbox  /sandbox/.openclaw/openclaw.json
     660 sandbox:sandbox  /sandbox/.openclaw/.config-hash
9. Audit log: NO shields_auto_restore entry. Last entry is the original
   shields_down event:
     {"action":"shields_down","sandbox":"","timestamp":"...",
      "timeout_seconds":120,"reason":"test","policy_applied":"permissive",...}

Stale ~/.nemoclaw/state/shields-timer-.json still records the dead PID.
ps -p  returns no process. Subsequent shields status calls do not
self-heal. Sandbox stays in permissive policy until operator manually runs
nemoclaw  shields up.

Real-world impact: any host reboot, runaway-process OOM kill, supervisor
restart, or accidental SIGKILL on the detached shields-timer.js process leaves
the sandbox unprotected indefinitely. Operators reasonably expect
"shields down --timeout Nm" to fail-secure (snap back UP) on host restart.
Suggested fixes
1. Make shields status validate the recorded timer PID. If shields-timer-.json
   references a dead PID and the deadline has passed, trigger restore inline AND
   re-report status, OR surface a distinct state (e.g. "DOWN (timer expired,
   recovery pending)") so operators see it.
2. Add a host-side watchdog (systemd user unit / launchd job) that periodically
   scans ~/.nemoclaw/state/shields-*.json for expired deadlines and runs
   shields up.
3. shields up should clear stale shields-timer-.json on completion (and
   verify pid_ liveness before treating it as live).

Bug Details

Field Value
Priority Unprioritized
Action Dev - Open - To fix
Disposition Open issue
Module Machine Learning - NemoClaw
Keyword NemoClaw, NemoClaw_CLI&UX, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Security, NemoClaw-SWQA-RelBlckr-Recommended

[NVB#6150114]

Metadata

Metadata

Assignees

Labels

NV QABugs found by the NVIDIA QA TeamUATIssues flagged for User Acceptance Testing.area: cliCommand line interface, flags, terminal UX, or outputplatform: ubuntuAffects Ubuntu Linux environmentssecurityPotential vulnerability, unsafe behavior, or access risk

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions