test(worker): extract signal dedupe to a testable module + cover eviction#93
Conversation
…tion Closes the worker test gap from the session-start audit. The worker held its dedupe state in module-level Maps with no exported entry points, so TTL expiry and capacity eviction had zero coverage. Both behaviors matter for production memory bounds. Refactor: - Extract SignalDeduper class to apps/worker/src/signal-dedupe.ts with constructor-injected TTL, capacity, and clock. Pure module — no globals, no side effects on import. - Replace module-level seenSignalIdsByUser map + getSignalDedupeMap / hasForwardedSignal / markSignalForwarded helpers in index.ts with delegations to a single SignalDeduper instance. - Add SignalDeduper.pruneUsers(activeUserIds) so the worker's user-discovery loop can release dedupe memory for users no longer tracked. Replaces the two-line ad-hoc loop in index.ts. Behavior tightening: - Eviction now triggers on size >= maxPerUser (was strict >). The cap is now a hard ceiling — callers can't push the size above maxPerUser even by one. Previous logic allowed +1 overshoot. evict() drops expired entries first, then falls back to oldest-first removal until size < maxPerUser. Tests: +11 (apps/worker had zero tests). - Per-user isolation (state, source-namespacing) - TTL expiry boundary - mark() idempotency - reset() per-user clear - Eviction: expired-first sweep, then oldest-insertion-order - Eviction does not run while size is at or below the cap with no new insert (read-path TTL still respected) - Default constants exposed Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Refactors the worker’s in-memory signal deduplication into a standalone, testable SignalDeduper module with configurable TTL/capacity/clock, and adds the worker’s first unit tests to cover TTL and eviction behavior.
Changes:
- Added
SignalDeduper(plus exported defaults) in a newapps/worker/src/signal-dedupe.tsmodule. - Updated the worker loop to delegate dedupe checks/marks and user pruning to a single
SignalDeduperinstance. - Added
vitestunit tests covering TTL boundaries, per-user/source isolation, reset, and eviction ordering.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| apps/worker/src/signal-dedupe.ts | Introduces the new SignalDeduper class implementing TTL + bounded per-user capacity with eviction. |
| apps/worker/src/index.ts | Replaces module-level dedupe Maps/helpers with a single SignalDeduper instance and uses pruneUsers(). |
| apps/worker/src/tests/signal-dedupe.test.ts | Adds unit tests for dedupe behavior (TTL, isolation, reset, and eviction). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const map = this.seenByUser.get(userId); | ||
| if (!map) return false; | ||
| const seenAt = map.get(this.key(signal)); | ||
| return Boolean(seenAt && this.now() - seenAt < this.ttlMs); |
There was a problem hiding this comment.
has() treats a stored timestamp of 0 as “not present” because it uses Boolean(seenAt && …). When now() is injected for tests or alternative clocks that can return 0, a freshly-marked signal may incorrectly return false. Compare against undefined instead (e.g., early-return if seenAt === undefined, otherwise compute TTL).
| return Boolean(seenAt && this.now() - seenAt < this.ttlMs); | |
| if (seenAt === undefined) return false; | |
| return this.now() - seenAt < this.ttlMs; |
| private getOrCreate(userId: string): Map<string, number> { | ||
| let map = this.seenByUser.get(userId); | ||
| if (!map) { | ||
| map = new Map(); | ||
| this.seenByUser.set(userId, map); | ||
| } | ||
|
|
||
| // Pre-emptively evict when about to exceed the cap. Using >= here means | ||
| // the cap is a hard ceiling: callers can't push the size above maxPerUser | ||
| // even by one. Walking the map on every set would be O(n) per signal — | ||
| // too costly under load — so the sweep only runs when the size is at the | ||
| // cap and a new entry is about to land. | ||
| if (map.size >= this.maxPerUser) { | ||
| this.evict(map); | ||
| } |
There was a problem hiding this comment.
Eviction currently runs whenever map.size >= maxPerUser, even if mark() is updating an existing key (which doesn’t increase size). At the cap, refreshing an existing signal can unnecessarily evict unrelated entries and reduce dedupe coverage. Consider only evicting when inserting a new key would exceed the cap (e.g., check map.has(key) in mark() before calling evict(), or pass the computed key into getOrCreate).
| } | ||
| now = 1000; // far past TTL | ||
|
|
||
| // Insert one more — pushes size to 6, triggers eviction |
There was a problem hiding this comment.
This comment says the insert “pushes size to 6”, but SignalDeduper evicts before inserting when the map is already at capacity (size >= maxPerUser). Consider rewording to avoid implying the map ever reaches 6 entries.
| // Insert one more — pushes size to 6, triggers eviction | |
| // Insert one more while already at capacity; expired entries are evicted before insert |
| // Insert one more — exceeds cap, no expirations to sweep, oldest is dropped | ||
| dedup.mark(makeSignal('newest'), 'u'); | ||
|
|
||
| // We had s0, s1, s2; insertion of newest pushes size to 4, evict drops s0 |
There was a problem hiding this comment.
This comment assumes insertion temporarily “pushes size to 4”, but eviction happens before the insert when the map is at capacity. Consider rewording to describe eviction triggering on the next mark() while already at the cap.
| // We had s0, s1, s2; insertion of newest pushes size to 4, evict drops s0 | |
| // We had s0, s1, s2; the next mark() arrives while already at the cap, | |
| // so eviction drops s0 and newest is retained, keeping size at 3. |
| // Size is at the cap (5), so the eviction sweep does NOT run on insert. | ||
| // Test the read path: has() still respects the TTL. | ||
| expect(dedup.has(makeSignal('s0'), 'u')).toBe(false); | ||
| // But the entries are still in the map until eviction triggers. |
There was a problem hiding this comment.
The comment mentions eviction “does NOT run on insert”, but this test doesn’t perform an insert after advancing time. Consider adjusting the wording to clarify that expired entries remain in memory until a future mark() triggers eviction.
| // Size is at the cap (5), so the eviction sweep does NOT run on insert. | |
| // Test the read path: has() still respects the TTL. | |
| expect(dedup.has(makeSignal('s0'), 'u')).toBe(false); | |
| // But the entries are still in the map until eviction triggers. | |
| // After time advances, the read path still respects TTL. | |
| // Expired entries remain in memory until a future mark() triggers eviction. | |
| expect(dedup.has(makeSignal('s0'), 'u')).toBe(false); |
Captures the second half of the audit cleanup work that landed after #87: - #88 config/core test coverage (was 0/missing helpers) - #89 connectors gmail + calendar pure-logic tests (8 → 43) - #90 llm-client URL validation hardening (zone IDs, trailing dots, CGNAT) - #91 adapter-discovery factory shape check + manifest defaultConfig - #92 preference-archaeologist coverage (action keys, multi-group, expiry) - #93 worker SignalDeduper extraction + 11 tests Pure docs. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Closes the worker test gap from the session-start audit. The worker held its dedupe state in module-level Maps with no exported entry points, so TTL expiry and capacity eviction had zero coverage. Both matter for production memory bounds.
Refactor
Behavior tightening
Tests — apps/worker had zero tests before this PR. Now has 11.
Test plan
🤖 Generated with Claude Code