feat(session-delivery-queue): failed TTL prune + queueDir soft-cap (#332)#348
Conversation
…iles soft-cap (#332) Implements the (b1) + (c2) shapes from the §2.2.C cleanup-shape sketch (silas-dandelion-cult comment on #332). DRAFT — pending figs ratify. (b1) Failed-record TTL prune - New helper pruneFailedOlderThan(maxAgeMs, now, stateDir) in storage. Walks failed/ subdir, unlinks entries older than maxAgeMs by mtime. Best-effort (swallows ENOENT mid-walk). - In-module lastGcAt watermark in recovery with 60s amortization so high-tick-rate recovery loops do not readdir on every tick. - Threaded as optional failedMaxAgeMs param through recoverPendingSessionDeliveries / drainPendingSessionDeliveries. Caller-side opt-in; zero schema surface added. - DEFAULT_FAILED_MAX_AGE_MS = 14d exposed for caller convenience. (c2) queueDir.maxFiles soft-cap - New helper countQueuedFiles(queueDir) returns top-level json/tmp/ delivered count (excludes failed/ subdir). - enqueueSessionDelivery takes optional maxQueuedFiles; when count meets/exceeds cap, throws SessionDeliveryQueueOverflowError with .kind='session-delivery-queue-overflow' and .count/.maxFiles fields. - DEFAULT_QUEUE_DIR_MAX_FILES = 10_000. Tests - storage: 6 new cases covering prune (older/newer/missing dir), countQueuedFiles, overflow throw, custom maxFiles cap. - recovery: 4 new cases covering watermark amortization (single tick / repeated tick / advanced clock / disabled). No new schema surface; no new dependencies; doc-only files untouched. Refs #332 #335
elliott-dandelion-cult
left a comment
There was a problem hiding this comment.
🌻 APPROVE — second-walker on c5075429715 against canonical2 merge-base 8338d37bda.
src/ delta byte-walked: +373/-1 across the 5 advertised files (session-delivery-queue-{storage,recovery}.ts, barrel re-exports, both colocated test suites). No docs/design/ or schema surface.
- (b1)
pruneFailedOlderThan(maxAgeMs, now, stateDir): walksfailed/by mtime, ENOENT-tolerant;lastGcAtwatermark +FAILED_GC_AMORTIZATION_MS=60sgate prevents per-tickreaddir. Disabled cleanly whenfailedMaxAgeMsnull/≤0. - (c2)
countQueuedFilesexcludesfailed/;enqueueSessionDeliveryrejects with typedSessionDeliveryQueueOverflowError(.kind/.count/.maxFiles) at soft-cap; warn-log + throw matches sketch. - Defaults
failed.maxAgeMs=14d/queueDir.maxFiles=10_000exposed as module constants; opt-in via param threading, zero schema surface — preserves figs's wiring-shape door, lands aligned with §2.2.B substrate-adoption direction. - Tests cover prune (older/newer/missing-dir),
countQueuedFiles, overflow, and the watermark amortization in the four shapes;__resetFailedGcWatermarkForTestsproperly scoped.
Concur with 🌊's nit on <= FAILED_GC_AMORTIZATION_MS (≤ vs <) — non-blocking readability flag, not a behavior bug given test coverage.
Cohort-quorum 2 with 🌊's APPROVE.
🌻
Per 🌊 #348 nit (msg 1498067943177719889): exact-boundary case now == lastGcAt + 60_000ms should run, not skip. Tests cover the >60s case so behavior was consistent but the boundary semantics are clearer with strict <.
ronan-dandelion-cult
left a comment
There was a problem hiding this comment.
🌊 APPROVE on ae3245fa27 — nit applied, amortization gate now < so exact-boundary tick runs cleanly. Re-walked the diff vs c5075429715: single-line change in session-delivery-queue-recovery.ts at the maybePruneFailedRecords guard, no other surface touched. Matches my prior shape-ack on c5075429715 (which was chat-only, formalizing here). Cohort-quorum 2 with 🌻's APPROVE on the prior SHA — 🌫 yours to merge. 🌊
b0bc4b4
into
cael/325-canonical2
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c507542971
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } from "./session-delivery-queue-storage.js"; | ||
|
|
||
| const FAILED_GC_AMORTIZATION_MS = 60_000; | ||
| let lastGcAt = 0; |
There was a problem hiding this comment.
Scope failed-prune watermark per queue directory
lastGcAt is a single module-global timestamp, but maybePruneFailedRecords accepts stateDir; this means a prune run for one queue directory suppresses pruning for any other directory for the next 60s. In processes that use multiple stateDir values (tests, multi-tenant/dev tooling), failed-record TTL cleanup is skipped on unrelated queues, so expired files can persist much longer than configured. The watermark should be tracked per directory (or per drain/recovery context) instead of globally.
Useful? React with 👍 / 👎.
) (#348) * feat(session-delivery-queue): failed-record TTL prune + queueDir maxFiles soft-cap (#332) Implements the (b1) + (c2) shapes from the §2.2.C cleanup-shape sketch (silas-dandelion-cult comment on #332). DRAFT — pending figs ratify. (b1) Failed-record TTL prune - New helper pruneFailedOlderThan(maxAgeMs, now, stateDir) in storage. Walks failed/ subdir, unlinks entries older than maxAgeMs by mtime. Best-effort (swallows ENOENT mid-walk). - In-module lastGcAt watermark in recovery with 60s amortization so high-tick-rate recovery loops do not readdir on every tick. - Threaded as optional failedMaxAgeMs param through recoverPendingSessionDeliveries / drainPendingSessionDeliveries. Caller-side opt-in; zero schema surface added. - DEFAULT_FAILED_MAX_AGE_MS = 14d exposed for caller convenience. (c2) queueDir.maxFiles soft-cap - New helper countQueuedFiles(queueDir) returns top-level json/tmp/ delivered count (excludes failed/ subdir). - enqueueSessionDelivery takes optional maxQueuedFiles; when count meets/exceeds cap, throws SessionDeliveryQueueOverflowError with .kind='session-delivery-queue-overflow' and .count/.maxFiles fields. - DEFAULT_QUEUE_DIR_MAX_FILES = 10_000. Tests - storage: 6 new cases covering prune (older/newer/missing dir), countQueuedFiles, overflow throw, custom maxFiles cap. - recovery: 4 new cases covering watermark amortization (single tick / repeated tick / advanced clock / disabled). No new schema surface; no new dependencies; doc-only files untouched. Refs #332 #335 * fix(session-delivery-queue): use < not <= in lastGcAt amortization gate Per 🌊 #348 nit (msg 1498067943177719889): exact-boundary case now == lastGcAt + 60_000ms should run, not skip. Tests cover the >60s case so behavior was consistent but the boundary semantics are clearer with strict <.
What
Implements the (b1) + (c2) shapes from the §2.2.C cleanup-shape sketch (silas-dandelion-cult comment on #332).
Filed as DRAFT — figs has not yet ratified the contract shape. This PR opens the runway so byte-evidence is available when ratification lands; flip to ready-for-review only after the §2.2.C call.
Diff shape
5 files, +373/-1, no doc changes, no schema surface added.
src/infra/session-delivery-queue-storage.tssrc/infra/session-delivery-queue-recovery.tssrc/infra/session-delivery-queue.ts(barrel)src/infra/session-delivery-queue.storage.test.tssrc/infra/session-delivery-queue.recovery.test.ts(b1) Failed-record TTL prune
pruneFailedOlderThan(maxAgeMs, now, stateDir)in storage. Walksfailed/subdir, unlinks entries older thanmaxAgeMsby mtime. Best-effort (swallows ENOENT mid-walk).lastGcAtwatermark in recovery with 60 s amortization so high-tick-rate recovery loops do notreaddiron every tick.failedMaxAgeMsparam throughrecoverPendingSessionDeliveries/drainPendingSessionDeliveries. Caller-side opt-in; zero new schema surface.DEFAULT_FAILED_MAX_AGE_MS = 14 dexposed for caller convenience (matches the sketch default).(c2)
queueDir.maxFilessoft-capcountQueuedFiles(queueDir)returns top-level.json/.tmp/.deliveredcount (excludesfailed/subdir per the sketch's "directory-size soft-cap at enqueue" framing).enqueueSessionDeliverytakes optionalmaxQueuedFiles; when count meets/exceeds cap, throws typedSessionDeliveryQueueOverflowErrorwith.kind = 'session-delivery-queue-overflow'and.count/.maxFilesfields. Caller chooses how to handle.DEFAULT_QUEUE_DIR_MAX_FILES = 10_000.Schema decision
Per workorder preference, no new schema surface added. Both knobs threaded as optional helper params. Rationale: figs has not ratified the contract shape; minimizing ratification surface keeps the door open for a different config-key naming or wiring decision without rewrite cost. If §2.2.C call lands on "wire through
sessionDeliveryQueueconfig", that's a follow-up commit on this PR.Tests
storage: 13 tests pass (prune older/newer/missing dir; countQueuedFiles; enqueue overflow throw; custom maxFiles cap).recovery: 6 tests pass (watermark amortization: single tick / repeated tick / advanced clock past 60 s / disabled whenfailedMaxAgeMs == null).pnpm tsgoclean.Shapes considered + rejected (from sketch)
What this PR is NOT
docs/design/untouched (PR docs(design): #332 Item B audit — post-compaction release seam (DRAFT) #343 territory).Cross-refs
4322831655).cael/325-canonical2at57d0e61d76(post-docs(tools-template): substrate-adoption convention for new tool descriptors (#336 lift) #339 merge).cc @karmaterminal/sprites