Skip to content

fix(shields): prevent destroy from leaving stale timer/state artifacts #3219

@jyaunches

Description

@jyaunches

Summary

Follow-up from the review of merged PR #3162 (fix(shields): clean up stale shields state on sandbox destroy). #3162 correctly removes ~/.nemoclaw/state/shields-<sandbox>.json and shields-timer-<sandbox>.json during nemoclaw destroy, fixing the normal stale-state path from #3114.

There are a few remaining cleanup gaps worth addressing so destroy cannot leave or recreate stale shields artifacts.

Problem

1. Destroy can leave a detached auto-restore timer running

If a sandbox is currently in shields down --timeout ... mode when nemoclaw destroy runs, the detached auto-restore timer process is not explicitly killed.

The destroy path now deletes:

  • ~/.nemoclaw/state/shields-<sandbox>.json
  • ~/.nemoclaw/state/shields-timer-<sandbox>.json

…but if the detached timer process is still alive, it may wake later and recreate shields-<sandbox>.json via the timer restore path. If the user re-onboards the same sandbox name inside that window, deriveShieldsMode() can again observe stale state and report the fresh sandbox as locked/unlocked incorrectly.

This is narrow, but it is the same class of stale-state bug #3114 was intended to eliminate.

Related code:

2. Shields state directory path is duplicated

path.join(process.env.HOME ?? "/tmp", ".nemoclaw", "state") exists in multiple shields-related places. #3162 added another copy in the destroy module.

This should become a shared exported helper/constant so future state-dir changes cannot drift across:

  • src/lib/shields/index.ts
  • src/lib/shields/timer.ts
  • src/lib/actions/sandbox/destroy.ts

3. Policy snapshots accumulate

shields down writes timestamped files like policy-snapshot-<ts>.yaml under the state directory. These do not currently affect deriveShieldsMode() because they are not per-sandbox state files, but they can accumulate indefinitely.

Destroy could either remove snapshots that are known to belong to the sandbox, or we should add enough metadata/naming to make safe cleanup possible later.

4. Cleanup is silent in audit trail

Destroy-side shields cleanup currently removes files without writing to shields-audit.jsonl. Adding an audit entry such as shields_state_cleared_on_destroy would make the cleanup observable for troubleshooting and forensics.

5. Error branch test coverage

#3162 added good behavioral tests for:

  • removing both state files
  • no-op when files are absent
  • preserving other sandboxes' files
  • rejecting path traversal

The warning path for real removal failures (for example EPERM) is not currently covered. A small mocked fs.rmSync test should assert that the warning is emitted and the function continues.

Acceptance Criteria

  • nemoclaw destroy <sandbox> kills or otherwise neutralizes any live shields auto-restore timer for that sandbox before deleting the timer marker.
  • A detached timer cannot recreate shields-<sandbox>.json after the sandbox has been destroyed.
  • Shields state directory resolution is centralized in one shared helper/constant used by destroy, shields commands, and the timer.
  • Decide and document policy-snapshot cleanup behavior; implement if safe in this issue, otherwise file/link a follow-up.
  • Destroy-side shields cleanup writes an audit entry when it removes/attempts to remove shields state.
  • Tests cover the active-timer destroy case and the warning/error branch for failed file removal.

Suggested implementation notes

  • Prefer reusing/exporting the existing killTimer() behavior from src/lib/shields/index.ts rather than duplicating marker parsing in destroy.
  • Coordinate with fix(shields): self-heal expired auto-restore timers #3204 because it also touches timer lifecycle/recovery semantics.
  • Keep cleanup best-effort for file-removal failures, but avoid silently allowing stale state to be recreated after destroy.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: e2eEnd-to-end tests, nightly failures, or validation infrastructure
    No fields configured for Enhancement.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions