feat(plugin-sdk): add persistent keyed store helper#70626
feat(plugin-sdk): add persistent keyed store helper#70626
Conversation
a2ea03c to
ce9d5c3
Compare
|
Migration note after checking the prior issues/PRs: I dug into Slack first. This PR looks like the first generic Slack is the cleanest first dogfood candidate:
That shape maps well onto this helper: namespace like Other good migration candidates:
Suggested order after #70626 lands: migrate Slack thread participation first, then Discord component registry, then MS Teams sent-message cache / Matrix approval reaction targets, then decide whether |
d10e286 to
6b62abf
Compare
5a120a3 to
ebe44e6
Compare
Three concrete fixes for clawsweeper bot's review on PR openclaw#33845: 1. **Stop leaking test-only helpers through the public API barrel.** extensions/slack/api.ts previously did 'export * from ./src/sent-thread-cache.js', which re-exported the underscored test-only helpers (_flushPersist, _resetForTests) as part of the slack extension's public API. Same-process consumers could call _resetForTests('/some/path') to redirect the persist path and trigger arbitrary writes/renames/unlinks. Replace the wildcard with an explicit list of public exports (recordSlackThreadParticipation, hasSlackThreadParticipation, clearSlackThreadParticipationCache). 2. **Reinsert on update so Map insertion order reflects recency.** recordSlackThreadParticipation called Map.set on existing keys, which updates the value but keeps the key in its original insertion-order position. The serialise-time MAX_ENTRIES cap sorts by timestamp so this didn't affect on-disk output, but any future iteration-order-based eviction would silently use the wrong recency. Switch to delete-then-set so insertion order matches actual recency. 3. **Cap in-memory map size on insert, not just on serialise.** The pre-existing cap was applied only at serialisation time so busy workspaces accumulating thousands of unique threads between debounced persists could grow the in-memory Map unboundedly. Add insertion-time eviction: once persistState.entries.size exceeds MAX_ENTRIES, evict the oldest entry by Map insertion order (which now correctly reflects recency thanks to fix #2). New tests: - 'reinserts on update so Map insertion order reflects recency': records A, then B, then re-records A. Verifies on-disk insertion order is [B, A] (re-recorded A moves to most-recent position). - 'caps in-memory map size on insert (not just on serialise)': records 5005 unique threads, verifies the persisted file has exactly 5000 entries with the oldest 5 evicted. Tests: 17/17 pass (was 15; +2 new). Not addressed in this commit (acknowledged in the PR-comment follow-up): clawsweeper also recommended migrating from the hand-rolled persistence to src/plugin-sdk/persistent-dedupe.ts (or a future persistent-keyed-store from PR openclaw#70626) to inherit file locking, atomic writes, bounded/quarantined parsing, and the shared storage contract. That refactor is non-trivial because the SDK helper is async while the current public API is sync; maintainer guidance is needed on whether to make the API async or wrap with sync semantics. Tracked as a follow-up.
opus-4-7 independent review found three issues missed by earlier review rounds: P1 — _persistPathOverride was still module-local Despite the Codex thread reply claiming the override moved into the shared persistState global, it was still a module-local 'let'. Tests passed by accident: in-memory dedupe shared via persistState.entries, but disk side effects from re-imported module copies leaked into STATE_DIR instead of the per-test temp dir. Moved into PersistState.persistPathOverride on the global singleton. P1 — CWE-59 symlink-follow on tmpPath (Aisle medium #2, unaddressed) tmpPath was ${filePath}.${process.pid}.tmp + default 'w' flag. A predictable name combined with a truncating O_CREAT would let an attacker who can write to the state dir pre-create a symlink at the tmp path pointing to e.g. /etc/passwd, and our write would follow it. Now uses randomUUID() for the name AND flag 'wx' (O_CREAT | O_EXCL) so any existing entry — file or symlink — fails the write safely. P2 — hydration ignored MAX_ENTRIES (Aisle medium #1) The 4 MB file-size cap permits ~250K entries through, blowing past the 5K in-memory MAX_ENTRIES on first load. Now collects valid candidates first, sorts DESC by ts, and slices to MAX_ENTRIES before inserting into the dedupe cache. Bounds memory regardless of file size up to MAX_PERSIST_FILE_BYTES. Tests: 26/26 pass (3 new tests covering each fix). Architectural note (per opus review): Land standalone. Don't migrate to plugin-sdk/persistent-dedupe — its hasRecent reads JSON on every miss, adding ~2-10ms to every Slack message in a thread the bot hasn't replied to (the dominant path for hasSlackThreadParticipation on the message-dispatch hot path). Warm-cache + debounced-write is correct here. Don't block on openclaw#70626.
gpt-5.5 independent review (running in parallel, no shared context with
opus) confirmed both opus P1s and opus P2-A. It also surfaced three
additional findings worth landing in this PR:
P2-B — Duplicated persist logic between schedulePersist callback and
_flushPersist. Maintenance trap: any future hardening had to be
mirrored in two places. Extracted into a shared persistToDisk()
function called from both. Single source of truth for prune/cap/
serialize/atomic-write.
P2-C / P3-C — Future timestamps from a crafted persist file (or a
backwards system-clock jump) survive forever. The previous guard
was 'now - ts < TTL_MS' which trivially holds for ts > now.
Added 'ts > 0 && ts <= now' to the hydration filter so:
* future-near (now + 60s) -> rejected
* future-far (MAX_SAFE_INTEGER) -> rejected
* zero / negative -> rejected
* monotonic-clock-skew tolerance -> small backwards jumps still
reject the affected entries on next hydrate; corrected on next
write cycle. Trade-off vs. monotonic-clock complexity is
intentional.
P2-D — mkdirSync called with no mode flag -> directory inherits umask
(often 0o755 — group/other readable). Now passes mode 0o700 to
match the SDK helper and complement the 0o600 file mode. Multi-tenant
hosts can no longer enumerate or symlink-squat the persist directory.
P3-B — Iteration-order inversion after over-capacity slice. When >
MAX_ENTRIES candidates were sorted DESC + sliced, the resulting
Map insertion order was ts-DESC, but the eviction loop in
recordSlackThreadParticipation assumes Map insertion order = ts-ASC
(oldest first). Without this fix, the first post-hydration eviction
drops the NEWEST entry instead of the oldest. Added an ASC re-sort
on both hydration and serialize paths so insertion order always
matches recency.
Tests: 29/29 pass (4 new tests covering each finding).
P3-A (TOCTOU stat->read race) deferred to follow-up: low-impact given
the 4 MB file cap already bounds the read, and a fix requires
fs.openSync + fstatSync + readSync changes that don't align with the
current stat-then-read structure. Worth its own PR.
Per gpt-5.5 architectural call: keep this PR standalone; converge
with plugin-sdk/persistent-keyed-store (openclaw#70626) as a follow-up after
this lands. The sync->async migration is straightforward (3 call sites,
all in async contexts) but should be coordinated with maintainers.
ebe44e6 to
36281ff
Compare
|
Codex review: keeping this open for maintainer follow-up; there is still a little grit to resolve. Keep this PR open. Current main does not yet expose Best possible solution: Keep this PR open for maintainer review. If accepted, land the narrow generic SDK subpath with package export, entrypoint metadata, API baseline, docs/changelog, focused storage/locking tests, and a storage-security review; then migrate the linked channel-specific registries in separate owner-scoped PRs. What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 7ddd815e469e. |
Summary
openclaw/plugin-sdk/persistent-keyed-storewithregister/lookup/consume/entries/clear, TTL pruning,maxEntrieseviction, deterministic enumeration, corrupt/higher-version quarantine, and same-process + cross-process serialization.Change Type (select all)
Scope (select all touched areas)
User-visible / Behavior Changes
openclaw/plugin-sdk/persistent-keyed-store.Diagram (if applicable)
Security Impact (required)
Yes, explain risk + mitigation:Repro + Verification
Environment
Human Verification (required)
pnpm test src/plugin-sdk/persistent-keyed-store.test.tspnpm tsgopnpm lint:corepnpm plugin-sdk:check-exportspnpm plugin-sdk:api:checkpnpm buildpnpm check:architecturemaxEntriespruningReview Conversations
Compatibility / Migration
Risks and Mitigations