Skip to content

test(worker): extract signal dedupe to a testable module + cover eviction#93

Merged
jayzalowitz merged 1 commit into
mainfrom
jayzalowitz/worker-cron-tests
Apr 27, 2026
Merged

test(worker): extract signal dedupe to a testable module + cover eviction#93
jayzalowitz merged 1 commit into
mainfrom
jayzalowitz/worker-cron-tests

Conversation

@jayzalowitz

Copy link
Copy Markdown
Owner

Summary

Closes the worker test gap from the session-start audit. The worker held its dedupe state in module-level Maps with no exported entry points, so TTL expiry and capacity eviction had zero coverage. Both matter for production memory bounds.

Refactor

  • Extract `SignalDeduper` class to `apps/worker/src/signal-dedupe.ts` with constructor-injected TTL, capacity, and clock. Pure module — no globals, no side effects on import.
  • Replace module-level `seenSignalIdsByUser` Map + `getSignalDedupeMap` / `hasForwardedSignal` / `markSignalForwarded` helpers with delegations to a single `SignalDeduper` instance.
  • Add `SignalDeduper.pruneUsers(activeUserIds)` so the worker's user-discovery loop can release dedupe memory for users no longer tracked.

Behavior tightening

  • Eviction now triggers on `size >= maxPerUser` (was strict `>`). The cap is now a hard ceiling — callers can't push the size above `maxPerUser` even by one. Previous logic allowed +1 overshoot. `evict()` drops expired entries first, then falls back to oldest-first removal until `size < maxPerUser`.

Tests — apps/worker had zero tests before this PR. Now has 11.

  • Per-user isolation (state, source-namespacing for cross-connector ids)
  • TTL expiry boundary
  • `mark()` idempotency
  • `reset()` per-user clear
  • Eviction: expired-first sweep, then oldest-insertion-order
  • Eviction stays inert while size is at or below the cap with no new insert
  • Default `DEFAULT_TTL_MS` and `DEFAULT_MAX_PER_USER` constants exposed

Test plan

  • `pnpm --filter @skytwin/worker test` — 11/11 (was 0)
  • `pnpm --filter @skytwin/worker lint` — clean
  • `pnpm test` — 38/38 turbo tasks
  • `pnpm lint` — 30/30 turbo tasks

🤖 Generated with Claude Code

…tion

Closes the worker test gap from the session-start audit. The worker
held its dedupe state in module-level Maps with no exported entry
points, so TTL expiry and capacity eviction had zero coverage. Both
behaviors matter for production memory bounds.

Refactor:
- Extract SignalDeduper class to apps/worker/src/signal-dedupe.ts
  with constructor-injected TTL, capacity, and clock. Pure module —
  no globals, no side effects on import.
- Replace module-level seenSignalIdsByUser map + getSignalDedupeMap /
  hasForwardedSignal / markSignalForwarded helpers in index.ts with
  delegations to a single SignalDeduper instance.
- Add SignalDeduper.pruneUsers(activeUserIds) so the worker's
  user-discovery loop can release dedupe memory for users no longer
  tracked. Replaces the two-line ad-hoc loop in index.ts.

Behavior tightening:
- Eviction now triggers on size >= maxPerUser (was strict >). The cap
  is now a hard ceiling — callers can't push the size above maxPerUser
  even by one. Previous logic allowed +1 overshoot. evict() drops
  expired entries first, then falls back to oldest-first removal until
  size < maxPerUser.

Tests: +11 (apps/worker had zero tests).
- Per-user isolation (state, source-namespacing)
- TTL expiry boundary
- mark() idempotency
- reset() per-user clear
- Eviction: expired-first sweep, then oldest-insertion-order
- Eviction does not run while size is at or below the cap with no new
  insert (read-path TTL still respected)
- Default constants exposed

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 27, 2026 03:35

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the worker’s in-memory signal deduplication into a standalone, testable SignalDeduper module with configurable TTL/capacity/clock, and adds the worker’s first unit tests to cover TTL and eviction behavior.

Changes:

  • Added SignalDeduper (plus exported defaults) in a new apps/worker/src/signal-dedupe.ts module.
  • Updated the worker loop to delegate dedupe checks/marks and user pruning to a single SignalDeduper instance.
  • Added vitest unit tests covering TTL boundaries, per-user/source isolation, reset, and eviction ordering.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
apps/worker/src/signal-dedupe.ts Introduces the new SignalDeduper class implementing TTL + bounded per-user capacity with eviction.
apps/worker/src/index.ts Replaces module-level dedupe Maps/helpers with a single SignalDeduper instance and uses pruneUsers().
apps/worker/src/tests/signal-dedupe.test.ts Adds unit tests for dedupe behavior (TTL, isolation, reset, and eviction).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

const map = this.seenByUser.get(userId);
if (!map) return false;
const seenAt = map.get(this.key(signal));
return Boolean(seenAt && this.now() - seenAt < this.ttlMs);

Copilot AI Apr 27, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has() treats a stored timestamp of 0 as “not present” because it uses Boolean(seenAt && …). When now() is injected for tests or alternative clocks that can return 0, a freshly-marked signal may incorrectly return false. Compare against undefined instead (e.g., early-return if seenAt === undefined, otherwise compute TTL).

Suggested change
return Boolean(seenAt && this.now() - seenAt < this.ttlMs);
if (seenAt === undefined) return false;
return this.now() - seenAt < this.ttlMs;

Copilot uses AI. Check for mistakes.
Comment on lines +84 to +98
private getOrCreate(userId: string): Map<string, number> {
let map = this.seenByUser.get(userId);
if (!map) {
map = new Map();
this.seenByUser.set(userId, map);
}

// Pre-emptively evict when about to exceed the cap. Using >= here means
// the cap is a hard ceiling: callers can't push the size above maxPerUser
// even by one. Walking the map on every set would be O(n) per signal —
// too costly under load — so the sweep only runs when the size is at the
// cap and a new entry is about to land.
if (map.size >= this.maxPerUser) {
this.evict(map);
}

Copilot AI Apr 27, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eviction currently runs whenever map.size >= maxPerUser, even if mark() is updating an existing key (which doesn’t increase size). At the cap, refreshing an existing signal can unnecessarily evict unrelated entries and reduce dedupe coverage. Consider only evicting when inserting a new key would exceed the cap (e.g., check map.has(key) in mark() before calling evict(), or pass the computed key into getOrCreate).

Copilot uses AI. Check for mistakes.
}
now = 1000; // far past TTL

// Insert one more — pushes size to 6, triggers eviction

Copilot AI Apr 27, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment says the insert “pushes size to 6”, but SignalDeduper evicts before inserting when the map is already at capacity (size >= maxPerUser). Consider rewording to avoid implying the map ever reaches 6 entries.

Suggested change
// Insert one more — pushes size to 6, triggers eviction
// Insert one more while already at capacity; expired entries are evicted before insert

Copilot uses AI. Check for mistakes.
// Insert one more — exceeds cap, no expirations to sweep, oldest is dropped
dedup.mark(makeSignal('newest'), 'u');

// We had s0, s1, s2; insertion of newest pushes size to 4, evict drops s0

Copilot AI Apr 27, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment assumes insertion temporarily “pushes size to 4”, but eviction happens before the insert when the map is at capacity. Consider rewording to describe eviction triggering on the next mark() while already at the cap.

Suggested change
// We had s0, s1, s2; insertion of newest pushes size to 4, evict drops s0
// We had s0, s1, s2; the next mark() arrives while already at the cap,
// so eviction drops s0 and newest is retained, keeping size at 3.

Copilot uses AI. Check for mistakes.
Comment on lines +126 to +129
// Size is at the cap (5), so the eviction sweep does NOT run on insert.
// Test the read path: has() still respects the TTL.
expect(dedup.has(makeSignal('s0'), 'u')).toBe(false);
// But the entries are still in the map until eviction triggers.

Copilot AI Apr 27, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment mentions eviction “does NOT run on insert”, but this test doesn’t perform an insert after advancing time. Consider adjusting the wording to clarify that expired entries remain in memory until a future mark() triggers eviction.

Suggested change
// Size is at the cap (5), so the eviction sweep does NOT run on insert.
// Test the read path: has() still respects the TTL.
expect(dedup.has(makeSignal('s0'), 'u')).toBe(false);
// But the entries are still in the map until eviction triggers.
// After time advances, the read path still respects TTL.
// Expired entries remain in memory until a future mark() triggers eviction.
expect(dedup.has(makeSignal('s0'), 'u')).toBe(false);

Copilot uses AI. Check for mistakes.
@jayzalowitz jayzalowitz merged commit 88c8242 into main Apr 27, 2026
12 checks passed
@jayzalowitz jayzalowitz deleted the jayzalowitz/worker-cron-tests branch April 27, 2026 03:48
jayzalowitz added a commit that referenced this pull request Apr 27, 2026
Captures the second half of the audit cleanup work that landed after
#87:

- #88 config/core test coverage (was 0/missing helpers)
- #89 connectors gmail + calendar pure-logic tests (8 → 43)
- #90 llm-client URL validation hardening (zone IDs, trailing dots, CGNAT)
- #91 adapter-discovery factory shape check + manifest defaultConfig
- #92 preference-archaeologist coverage (action keys, multi-group, expiry)
- #93 worker SignalDeduper extraction + 11 tests

Pure docs.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants