fix(slack): persist thread participation cache to survive gateway restarts [claude, human developer oversight]#33845
Conversation
Greptile SummaryThis PR persists the Slack thread participation cache to Confidence Score: 5/5This PR is safe to merge — the persistence path is non-critical (fall-through to in-memory behavior on any failure) and the security hardening is comprehensive. No P0 or P1 findings. The logic is carefully guarded at every layer: bounded file size before parsing, schema validation, future-timestamp rejection, atomic write semantics (CWE-59 mitigation), and correct insertion-order maintenance for the eviction loop. All previous review concerns have been addressed and regression-tested. Formatting changes in other files are purely cosmetic. No files require special attention. Reviews (6): Last reviewed commit: "Merge remote-tracking branch 'origin/mai..." | Re-trigger Greptile |
Additional Comments (1)
The existing test suite's If someone runs these tests on a machine that has live Slack thread data in its state directory, the first test picks up that data, making the suite environment-dependent. The persistence suite correctly uses Prompt To Fix With AIThis is a comment left during a code review.
Path: src/slack/sent-thread-cache.test.ts
Line: 14-17
Comment:
**`afterEach` leaves `loaded` as `true`, risking real state-dir reads in tests**
The existing test suite's `afterEach` calls `clearSlackThreadParticipationCache()`, which clears the in-memory map but does **not** reset the `loaded` flag. As a result, the very first test to run in this suite will invoke `loadFromDisk()` and attempt to read from the real `$STATE_DIR/slack-thread-participation.json` (the default path, because `persistPathOverride` is `undefined`). Subsequent tests in the suite are unaffected because `loaded` stays `true`.
If someone runs these tests on a machine that has live Slack thread data in its state directory, the first test picks up that data, making the suite environment-dependent. The persistence suite correctly uses `_resetForTests(undefined)` which resets `loaded` and `persistPathOverride`. Using the same helper here would keep things consistent and safe:
```suggestion
afterEach(() => {
_resetForTests(undefined);
vi.restoreAllMocks();
});
```
How can I resolve this? If you propose a fix, please make it concise. |
Addresses greptile review on PR openclaw#33845.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0a210d0aa0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- clearSlackThreadParticipationCache now always persists empty state, even before the cache has been loaded (wipes stale persist files) - Added 2 tests: clear-then-restart survival, clear-before-load wipes Addresses greptile + codex-connector review on PR openclaw#33845.
|
Confidence Score: 5/5 ✅ The only gap from the previous review has been fully addressed. You've added two excellent tests for the
The test suite is now comprehensive (6 persistence tests total), the implementation is solid, and there's complete regression protection for the clear functionality. No changes needed — this is ready to merge. |
@greptile-apps: if you could update your main score comment with this change, it would be somewhat easier for reviewers. Thank you in advance! |
Additional Comments (1)
On a developer's machine running The other tests in this block are safe — The straightforward fix is to add path isolation to this describe block's describe("slack sent-thread-cache", () => {
let tempDir: string;
beforeEach(() => {
tempDir = fs.mkdtempSync(path.join(os.tmpdir(), "slack-thread-cache-test-"));
_resetForTests(path.join(tempDir, "slack-thread-participation.json"));
});
afterEach(() => {
_resetForTests(undefined);
vi.restoreAllMocks();
try { fs.rmSync(tempDir, { recursive: true, force: true }); } catch { /* ignore */ }
});
...Alternatively, if keeping the two suites cleanly separated is preferred, the "clears all entries" test could be moved into the persistence suite (where |
|
Just a note to reviewers that greptile added it's 5/5 score as a secondary comment in this PR. |
|
CI/secrets failing due to pnpm-audit-prod on transitive deps (@buape/carbon -> @hono/node-server -> hono); PR does not modify package.json/lockfile. Happy to send a separate dependency bump/override PR if that’s the preferred fix |
6084bfb to
78bcd3e
Compare
Addresses greptile review on PR openclaw#33845.
- clearSlackThreadParticipationCache now always persists empty state, even before the cache has been loaded (wipes stale persist files) - Added 2 tests: clear-then-restart survival, clear-before-load wipes Addresses greptile + codex-connector review on PR openclaw#33845.
78bcd3e to
6619dc6
Compare
|
@greptile-apps: I'd apprecate you renewing your greptile summary as the 4/5 score there seems to be superceded by your 5/5 score in comments. |
|
@joshavant — small channel/gateway fix, would appreciate your eyes on when you have a moment! thanks in advance! |
6619dc6 to
f313a70
Compare
CI for the previous commit (ecf48d4) flagged two real issues: 1. oxfmt: extensions/slack/src/sent-thread-cache.test.ts had a trailing-comma / line-break inconsistency. Fixed by running `pnpm format`. 2. oxlint: 7 `if-without-braces` errors introduced by my defense-in-depth commit. Six in the test file (env-var save/restore blocks for the test-helper-guard tests), one in the source (insertion-cap eviction loop). Wrapped each single-line `if` body in braces. Note on remaining 7 lint errors visible in CI: those are PRE-EXISTING on origin/main (qr-runtime missing module, provider-catalog/tools type narrowing, and OutputRuntimeEnv 'error' type). Unrelated to this PR — confirmed by stashing my changes and running the same command on a clean main checkout, where the same 7 errors appear. Tests: 23/23 still pass.
opus-4-7 independent review found three issues missed by earlier review rounds: P1 — _persistPathOverride was still module-local Despite the Codex thread reply claiming the override moved into the shared persistState global, it was still a module-local 'let'. Tests passed by accident: in-memory dedupe shared via persistState.entries, but disk side effects from re-imported module copies leaked into STATE_DIR instead of the per-test temp dir. Moved into PersistState.persistPathOverride on the global singleton. P1 — CWE-59 symlink-follow on tmpPath (Aisle medium #2, unaddressed) tmpPath was ${filePath}.${process.pid}.tmp + default 'w' flag. A predictable name combined with a truncating O_CREAT would let an attacker who can write to the state dir pre-create a symlink at the tmp path pointing to e.g. /etc/passwd, and our write would follow it. Now uses randomUUID() for the name AND flag 'wx' (O_CREAT | O_EXCL) so any existing entry — file or symlink — fails the write safely. P2 — hydration ignored MAX_ENTRIES (Aisle medium #1) The 4 MB file-size cap permits ~250K entries through, blowing past the 5K in-memory MAX_ENTRIES on first load. Now collects valid candidates first, sorts DESC by ts, and slices to MAX_ENTRIES before inserting into the dedupe cache. Bounds memory regardless of file size up to MAX_PERSIST_FILE_BYTES. Tests: 26/26 pass (3 new tests covering each fix). Architectural note (per opus review): Land standalone. Don't migrate to plugin-sdk/persistent-dedupe — its hasRecent reads JSON on every miss, adding ~2-10ms to every Slack message in a thread the bot hasn't replied to (the dominant path for hasSlackThreadParticipation on the message-dispatch hot path). Warm-cache + debounced-write is correct here. Don't block on openclaw#70626.
gpt-5.5 independent review (running in parallel, no shared context with
opus) confirmed both opus P1s and opus P2-A. It also surfaced three
additional findings worth landing in this PR:
P2-B — Duplicated persist logic between schedulePersist callback and
_flushPersist. Maintenance trap: any future hardening had to be
mirrored in two places. Extracted into a shared persistToDisk()
function called from both. Single source of truth for prune/cap/
serialize/atomic-write.
P2-C / P3-C — Future timestamps from a crafted persist file (or a
backwards system-clock jump) survive forever. The previous guard
was 'now - ts < TTL_MS' which trivially holds for ts > now.
Added 'ts > 0 && ts <= now' to the hydration filter so:
* future-near (now + 60s) -> rejected
* future-far (MAX_SAFE_INTEGER) -> rejected
* zero / negative -> rejected
* monotonic-clock-skew tolerance -> small backwards jumps still
reject the affected entries on next hydrate; corrected on next
write cycle. Trade-off vs. monotonic-clock complexity is
intentional.
P2-D — mkdirSync called with no mode flag -> directory inherits umask
(often 0o755 — group/other readable). Now passes mode 0o700 to
match the SDK helper and complement the 0o600 file mode. Multi-tenant
hosts can no longer enumerate or symlink-squat the persist directory.
P3-B — Iteration-order inversion after over-capacity slice. When >
MAX_ENTRIES candidates were sorted DESC + sliced, the resulting
Map insertion order was ts-DESC, but the eviction loop in
recordSlackThreadParticipation assumes Map insertion order = ts-ASC
(oldest first). Without this fix, the first post-hydration eviction
drops the NEWEST entry instead of the oldest. Added an ASC re-sort
on both hydration and serialize paths so insertion order always
matches recency.
Tests: 29/29 pass (4 new tests covering each finding).
P3-A (TOCTOU stat->read race) deferred to follow-up: low-impact given
the 4 MB file cap already bounds the read, and a fix requires
fs.openSync + fstatSync + readSync changes that don't align with the
current stat-then-read structure. Worth its own PR.
Per gpt-5.5 architectural call: keep this PR standalone; converge
with plugin-sdk/persistent-keyed-store (openclaw#70626) as a follow-up after
this lands. The sync->async migration is straightforward (3 call sites,
all in async contexts) but should be coordinated with maintainers.
…icipation-persist # Conflicts: # extensions/slack/api.ts # extensions/slack/src/sent-thread-cache.ts
|
@greptile-apps: can I get a review? |
…icipation-persist # Conflicts: # extensions/slack/src/sent-thread-cache.test.ts
c322244 to
1c99594
Compare
Revert files unrelated to Slack thread-participation persistence back to
origin/main:
- extensions/feishu/src/outbound.{ts,test.ts}: prettier quote/wrap noise
- extensions/slack/src/monitor/context.ts: import order only
- src/commands/agent/session.test.ts: prettier line wrap
- src/gateway/server-chat.stream-text-merge.test.ts: prettier line wrap
- src/gateway/server/health-state.test.ts: prettier line wrap
- ui/src/ui/app-channels.test.ts: prettier line wrap
In-scope diff vs origin/main is now:
- extensions/slack/src/sent-thread-cache.{ts,test.ts}: the persistence fix
- extensions/slack/api.ts: 4-line security comment about explicit named
exports keeping _flushPersist/_resetForTests test-only
1c99594 to
b76b013
Compare
…icipation-persist # Conflicts: # extensions/slack/src/sent-thread-cache.test.ts # extensions/slack/src/sent-thread-cache.ts
Summary
Slack thread participation is tracked in-memory so the bot can keep replying in a thread after its first response (without requiring repeated
@mentions). That cache was lost on every gateway restart, so the bot would silently stop responding to non-mention thread messages until someone re-@mentionedit. This PR persists the Slack thread participation cache to disk so participation survives restarts.Review focus:
src/slack/sent-thread-cache.tsonly; file format is JSON{key→timestamp}.Before / After
Before: restart gateway → thread cache cleared → bot ignores non-mention messages in previously-participated threads
After: restart gateway → cache restored → bot continues participating normally (within TTL)
What changed
src/slack/sent-thread-cache.ts$STATE_DIR/slack-thread-participation.jsonunref()so it won't hold the process open)clearSlackThreadParticipationCache()now persists empty state so "clear" survives restarts (also wipes stale persist files)src/slack/sent-thread-cache.test.ts$STATE_DIRRisk
Low. This cache is non-critical: worst case is the current behavior (one extra
@mentionto re-establish participation). Persistence is best-effort; failures are swallowed and the bot falls back to in-memory behavior.Testing
vitestunit coverage for persistence + clear semantics, plus manual restart verification.AI-Assisted Development 🤖
This PR was developed with AI assistance (Claude/OpenClaw agent "Tank").
pnpm checkclean. Deployed locally and in daily use for ~4 weeks.