fix: prevent duplicate hook fires on slow inbound deliveries by uzyn · Pull Request #225 · uzyn/aimx

uzyn · 2026-05-11T09:52:03Z

Summary

Fixes the bug where a single forwarded email triggered multiple hooks. Root cause: on_receive hooks ran synchronously inside ingest_email, holding the SMTP 250 OK reply hostage for the entire hook duration (~7 min in the reported case). Gmail's MTA timed out around 5–7 min, considered delivery failed, retried — and aimx had no Message-Id dedup, so each retry ingested the message again and re-fired the hook.
Layer 1 (root cause): new HookMode::{Sync, Async} parameter on ingest_email. The SMTP DATA handler passes Async, which detaches hooks onto std::thread::spawn so 250 OK returns in seconds. The aimx ingest CLI and all existing tests keep Sync.
Layer 2 (defense in depth): per-mailbox in-memory Message-Id cache (backed by sparkv with 24h TTL + auto-eviction). Lazily rebuilt from on-disk frontmatter on first ingest after daemon startup. Entries loaded from disk inherit 24h − (now − received_at) as their TTL, so a daemon restart cannot stretch dedup protection past its real lifetime. Filename date-prefix pre-filter avoids opening files outside the 24h window. Duplicate Message-Ids are silently absorbed (no second .md, no second hook fire) but ingest_email still returns Ok so the SMTP path emits 250 OK and the remote MTA stops retrying.

Files changed

File	Change
`Cargo.toml`	+ `sparkv` 0.1 dep
`src/mailbox_locks.rs`	`lock_for` now returns `Arc<MailboxState>` (lock + dedup cache)
`src/ingest.rs`	`HookMode` enum, dedup check + cache insert, async hook spawn branch, disk-scan helpers, 4 new tests
`src/smtp/session.rs`	DATA handler passes `HookMode::Async`
`src/state_handler.rs`, `src/hook_handler.rs`, `src/mailbox_handler.rs`	Updated lock-acquisition call sites for the new state shape
`src/smtp/tests.rs`	Multi-transaction test now generates per-call unique `Message-Id`s

Test plan

cargo build clean
cargo clippy -- -D warnings clean
cargo fmt -- --check clean
cargo test --bin aimx — 1145 pass; 3 macOS-only pre-existing uds_authz failures (Linux-only project per CLAUDE.md)
cargo test --test integration — 116/116 pass
4 new dedup tests cover: in-memory cache hit, distinct Message-Ids allowed, lazy disk rebuild, expired entries excluded
Existing tests that re-ingested the same fixture twice updated to generate per-call unique Message-Ids — matching real-world client behavior
Manual end-to-end on staging: forward a slow-hook email from Gmail, confirm single received email + single firing hook_name=... log line, and a single reply lands in inbox

Out of scope

TLS close_notify graceful shutdown in src/smtp/mod.rs — Gmail's "peer closed without close_notify" is a downstream symptom of the timeout. Once 250 OK lands within seconds (this PR), Gmail closes cleanly. If it persists after deploy, a follow-up to call tokio_rustls::shutdown().await before the TCP shutdown is straightforward.
Admin command to manually purge a Message-Id from the dedup cache (defer until requested).

A single forwarded email was producing multiple replies because slow on_receive hooks held the SMTP DATA "250 OK" response for their entire runtime. Remote MTAs (e.g. Gmail) treated the silent SMTP connection as a delivery failure, retried, and aimx re-ingested + re-fired the hook on every retry. Fix in two layers: 1. Async hook execution - New HookMode::{Sync, Async} on ingest_email. - SMTP DATA handler now passes HookMode::Async so on_receive hooks spawn into a detached std::thread and the "250 OK" goes back to the remote MTA within seconds. - aimx ingest CLI and all tests keep HookMode::Sync to preserve end-to-end semantics (process / test must outlive the hook). 2. Message-Id dedup with 24h TTL (defense in depth) - Per-mailbox SparKV-backed cache lives on a new MailboxState that replaces the bare lock returned by MailboxLocks::lock_for. - Lazily rebuilt from disk on the first ingest into a mailbox after daemon start; entries loaded from frontmatter expire 24h after their original received_at, so a daemon restart cannot stretch dedup protection beyond its real window. - Filename date-prefix pre-filter avoids opening files outside the 24h retention window. - Duplicate Message-Ids are silently absorbed: no second .md, no second hook fire, but ingest still returns Ok so the SMTP path emits 250 OK and the remote MTA stops retrying. Tests: - Four new ingest dedup tests cover the cache hit, miss, lazy disk rebuild, and TTL-expiry paths. - Existing tests that intentionally re-used a fixture's hard-coded Message-Id (file_naming_increments, rapid_same_subject_*, duplicate_attachment_filenames_*, concurrent_same_subject_*, thread_id_deterministic_for_same_message, and the SMTP multi-transaction test) now generate per-call unique Message-Ids via small helpers, matching real-world client behavior.

uzyn

Changes Overview

Two-layer fix for the "single forwarded email → multiple analyst replies" bug. Layer 1: a new HookMode::{Sync, Async} parameter on ingest_email lets the SMTP DATA path detach on_receive hooks onto std::thread::spawn so 250 OK returns in seconds rather than after the hook completes. Layer 2: a per-mailbox SparKV-backed Message-Id dedup cache with 24h TTL, lazily rebuilt from disk on first ingest per mailbox after daemon startup, silently absorbs duplicate inbound deliveries within the retention window. Lock map was refactored from Arc<AsyncMutex<()>> to Arc<MailboxState> so the dedup cache hangs off the same per-mailbox state that already serializes inbound writes.

Scope Alignment

The implementation matches the stated intent. The aimx ingest CLI path explicitly keeps HookMode::Sync (so a short-lived process doesn't exit before its hook), the SMTP DATA path uses HookMode::Async, and all 6 existing call sites of MailboxLocks::lock_for were correctly updated for the new return shape. TLS close_notify and an admin "purge dedup cache" command are explicitly deferred. Out-of-scope test churn (per-call unique Message-Id in fixtures that re-ingest the same email twice) is justified — without it, the new dedup would silently suppress the second ingest and the tests' assertions would no longer measure what they previously measured.

Potential Bugs

1. Auth verification still blocks `250 OK` (Non-blocker)

verify_auth (DKIM/SPF/DMARC DNS lookups, including a fresh tokio::runtime::Runtime::new() per ingest) runs at src/ingest.rs ~line 245 before the dedup check and before the hook fork point. A slow recursive resolver or a pathological DKIM record can stall the spawn_blocking worker for many seconds — re-creating the same "remote MTA times out, retries" failure mode the PR is trying to fix, just with a smaller blast radius. The PR's hook-detach addresses the largest synchronous cost (multi-minute hook subprocesses), so this is a meaningful improvement on its own; calling it out so a follow-up isn't surprised when the bug surfaces again at a smaller scale.

2. 24h dedup TTL is shorter than the SMTP retry window (Non-blocker)

DEDUP_TTL = 24h in src/ingest.rs:110. RFC 5321 §4.5.4.1 sets the minimum give-up time at 4-5 days, and Gmail/Postfix defaults are in that range too. A legitimate retry past the 24h mark (rare but possible during prolonged inbound outages) will be treated as a fresh delivery and re-fire on_receive. This is a deliberate tradeoff per the PR description, and worth a one-liner in the constant's doc-comment ("we trade off correctness past 24h for a bounded cache size" or similar) so a future reader understands it's a knob, not an absolute.

3. Empty `Message-Id` bypasses dedup entirely (Non-blocker)

src/ingest.rs:170 gates dedup on !message_id.is_empty(). Real MTAs always set a Message-Id, but a hand-crafted SMTP client (or a buggy retry path on the sending side) that omits it would slip past dedup every retry and re-fire the hook each time. Worth either (a) falling back to a digest of (date, from, subject, raw[..N]) when Message-Id is missing, or (b) at minimum, an inline comment acknowledging the gap so the next person debugging "why did this hook fire 5×?" finds the answer in the file.

4. `read_frontmatter_id_pair` only inspects the first 4 KiB (Non-blocker)

src/ingest.rs:357: the disk-rebuild path reads at most 4 KiB. Typical inbound frontmatter is well under 1 KiB, but a long References chain plus many cc recipients (mailing lists) can push past 4 KiB. In that case find("\n+++") fails and read_frontmatter_id_pair returns None, so the entry is silently skipped from the rebuilt cache. The window in which this hurts is narrow (daemon restart + retry of an oversize message within 24h), but worth either bumping to 16 KiB or streaming line-by-line until +++ is found.

5. `set_with_ttl` failures are silent at the call site, with a real failure mode at scale (Non-blocker)

src/ingest.rs:215: if SparKV returns Err(MaxItemsExceeded) because we hit DEDUP_MAX_ITEMS = 100_000 within the 24h window, the warn-log fires but the Message-Id is not recorded. The very next retry of that message hits a cache miss and re-fires the hook. 100k/24h is ~1.2 inbound messages/second sustained per mailbox — absurd for any realistic aimx deployment, but worth noting because the failure mode silently regresses to "hook re-fires on every retry," which is the exact bug this PR fixes.

6. Catchall mailbox cache rebuild scope (Non-blocker)

When the recipient routes via the *@domain catchall, the dedup cache is keyed on the "catchall" mailbox name (MailboxLocks::lock_for("catchall")) and the disk scan walks <data_dir>/inbox/catchall/. That's correct for catchall-only deployments. For deployments that have both explicit mailboxes and a catchall, a Message-Id legitimately addressed to (say) alice@ and later re-delivered as unknown-recipient@ (different RCPT, same Message-Id) would NOT dedup — the two ingests land in different per-mailbox caches. Likely intentional (different recipients, different mailboxes), but worth a doc comment so it's not a future surprise.

Test Coverage

Strong coverage on the dedup side: ingest_skips_duplicate_message_id, ingest_allows_distinct_message_ids, ingest_dedup_cache_loads_from_disk, and ingest_dedup_cache_excludes_expired exercise the cache hit, miss, disk-rebuild, and TTL-expiry paths respectively. The disk-rebuild test pre-stages a stub .md with hand-written frontmatter rather than going through format_frontmatter — that's a small fragility (if the on-disk frontmatter format ever changes shape, this test won't notice), but acceptable for the cache-load behavior it's actually testing.

Gap on the async-hook side: the most consequential code change (detach the hook into std::thread::spawn so 250 OK returns fast) has zero targeted tests. There's no test that:

registers a slow on_receive hook (sleep ≥1s),
ingests a message through the SMTP DATA path,
asserts 250 OK returns in well under the hook duration,
and asserts the hook eventually runs.

The existing SMTP multi-transaction test (src/smtp/tests.rs) verifies the response content but not the response timing, so it would pass equally well with HookMode::Sync. This is the bug the PR exists to fix — it deserves a regression test. Non-blocker because the change is small and visually obvious, but adding it would close the loop.

Code Quality

src/ingest.rs:243-279: the async-hook spawn block is ~40 lines inline. Extracting fn spawn_on_receive_async(config, mailbox, md_path, meta) would shorten the match arm and make the failure-log site easier to find. Pure style, non-blocker.
The Arc<MailboxState> refactor at src/mailbox_locks.rs is clean and the doc-comment update is on point. The new seen_message_ids_starts_uninitialized test is appropriate.
parse_toml_string_value and read_frontmatter_id_pair are sufficient for the fixed-shape frontmatter aimx writes, but they're a hand-rolled subset of TOML. If format_frontmatter ever emits a multi-line string for, say, a very long subject, the line-based parse will silently miss it. Not changed in this PR but worth flagging since this PR is the first reader of frontmatter that doesn't go through serde/toml.

Summary and Recommended Actions

Overall verdict: Ready to merge with non-blocker follow-ups.

The two-layer fix is well-targeted at the root cause and the implementation is correct. Lock-hierarchy invariants are preserved, the dedup cache state lives behind the same per-mailbox serialization as inbound writes, and the cache insert is correctly ordered inside the lock so a concurrent retry can never see "on disk but not in cache." Existing tests were updated correctly (per-call unique Message-Ids in places that re-ingest the same fixture), and the new dedup tests are meaningful.

Blockers: none.
Non-blockers:
- No targeted regression test for the async-hook detach behavior (the primary fix).
- verify_auth still synchronously holds the SMTP DATA path while doing DNS — same failure shape as the original bug, smaller window.
- DEDUP_TTL = 24h is shorter than the RFC-allowed retry window; document the tradeoff in the constant's doc-comment.
- Empty Message-Id bypasses dedup; document or fall back to a content digest.
- 4 KiB read cap in read_frontmatter_id_pair can silently miss long-References frontmatter.
- Silent MaxItemsExceeded failure path at >100k entries/24h regresses to the original bug under extreme load.
Nice-to-haves:
- Extract the async-hook spawn into a helper for readability.
- Add a doc comment on the catchall/explicit-mailbox dedup-isolation behavior.

Cycle 1B follow-ups on the slow-inbound dedup PR. No blocker fixes — the review verdict was "Ready to merge with non-blocker follow-ups." Code changes: - Bump read_frontmatter_id_pair scan window from 4 KiB to 16 KiB so a long References chain plus many cc recipients (common on mailing lists) cannot silently drop a rebuilt cache entry. - Extract the inline async-hook spawn block into spawn_on_receive_async() for readability and a shorter match arm. New tests: - src/smtp/tests.rs::test_smtp_data_returns_fast_with_slow_on_receive_hook: regression test for the primary fix. Registers a 3s on_receive hook, drives an SMTP DATA transaction, asserts 250 OK returns under 1.5s (well below the hook duration) and that the hook eventually fires. Verified to FAIL when HookMode is reverted to Sync. Doc-only changes (no behavior change): - DEDUP_TTL: spell out the RFC-retry-window tradeoff so future readers see the 24h value is a knob, not an absolute. - Dedup gate: inline note that an empty Message-Id bypasses dedup, with a pointer for the why-did-this-hook-fire-N-times debug path. - MaxItemsExceeded warn: explain the regression-to-original-bug shape and that DEDUP_MAX_ITEMS is the lever. - mailbox_locks: document that the dedup cache is per-mailbox by design (different RCPTs land in different caches even on the same Message-Id). - parse_toml_string_value / read_frontmatter_id_pair: flag the hand-rolled TOML-subset limits so a future schema change knows where to look. Deferred per review (each is its own non-blocker): - #1 (verify_auth still blocks 250 OK): out of scope for this PR; its own follow-up. Smaller-blast-radius failure mode than the multi-minute hook this PR fixed.

uzyn · 2026-05-11T10:20:33Z

Cycle 1B: Review feedback addressed

Pushed 24a2291 addressing the non-blocker review items. Verdict was already "Ready to merge with non-blocker follow-ups"; this cycle closes the loop on most items in-place and explicitly defers the one larger follow-up.

Fixed in code

#	Item	What changed
2	`DEDUP_TTL` doc	Expanded doc-comment in `src/ingest.rs` calling out the RFC 5321 §4.5.4.1 retry window vs the chosen 24h value as an explicit tradeoff.
3	Empty `Message-Id` bypass	Inline comment at the dedup gate (`src/ingest.rs`) acknowledging the gap and pointing the next debugger at the source `.eml`. Did not add a content-digest fallback — the review itself notes "real MTAs always set a Message-Id," and synthesizing a digest would mask broken-sender bugs that an operator probably wants to see.
4	4 KiB read cap	Bumped to 16 KiB in `read_frontmatter_id_pair` (`src/ingest.rs`). Doc-comment now explains why (long `References` + many `cc` on mailing lists).
5	`MaxItemsExceeded` silence	Added a clarifying comment at the call site (`src/ingest.rs`) explaining that at >100k entries/24h the failure regresses to the original bug, and pointing at `DEDUP_MAX_ITEMS` as the knob. The existing `tracing::warn!` is the operator signal — no code-path change.
6	Catchall scope doc	Added a "Per-mailbox keying, by design" section to the `MailboxLocks` module doc (`src/mailbox_locks.rs`) covering the alice-vs-catchall scenario.
Quality	Spawn-helper extraction	Pulled the ~40-line async-hook spawn block out of the match arm into `spawn_on_receive_async(config, mailbox, md_path, meta)` in `src/ingest.rs`.
Quality	TOML parser limits	Added a doc-comment on `parse_toml_string_value` (and expanded one on `read_frontmatter_id_pair`) flagging the hand-rolled subset so a future `frontmatter.rs` schema change knows where to look.

Regression test added

src/smtp/tests.rs::test_smtp_data_returns_fast_with_slow_on_receive_hook

Closes the test-coverage gap the review flagged. Registers a 3-second on_receive hook on alice, drives a full SMTP DATA transaction, and asserts:

250 OK returns in under 1.5s (half the hook's sleep). Proves the hook ran detached, not inline.
The hook's marker file eventually appears. Proves the detached thread actually ran the hook (not silently dropped on spawn failure).
The email is on disk.

Sanity-checked: I temporarily reverted HookMode::Async to HookMode::Sync in src/smtp/session.rs and the new test fails with the expected message ("250 OK must return well under the hook's 3s sleep (took 3119ms, limit 1500ms)…"). Restored to Async before commit.

Deferred to a follow-up PR

#1 — verify_auth still blocks 250 OK. This is real (DKIM/SPF/DMARC DNS lookups, including a fresh tokio::runtime::Runtime::new() per ingest, all happen inline before the dedup gate and hook fork point). I'm deferring it because:

It's a different bug surface from this PR's scope. The PR fixed the largest synchronous cost (multi-minute hooks); auth verification has a smaller blast radius (DNS-bounded, not user-script-bounded).
The fix is non-trivial — either move auth verification onto a spawn_blocking worker with a hard timeout, or detach it the same way hooks were. Both deserve their own focused review rather than being tacked onto this PR.

Will file as a follow-up issue/PR.

Verification

cargo test --bin aimx -- ingest:: smtp::tests:: — 117 passed (including the new regression test).
cargo test --bin aimx -- hook:: mailbox_locks:: — 42 passed.
cargo clippy --all-targets -- -D warnings — clean.
cargo fmt -- --check — clean.

(Three pre-existing uds_authz::tests::* failures on this macOS host are unrelated to this PR — those tests assume uid 4_294_967_294 is unmapped, but it resolves to nobody on macOS. The project is Linux-only and CI runs on ubuntu-latest where they pass.)

uzyn

Re-Review (cycle 1B, head `24a2291`)

Diff scope: 155aef4..24a2291 — src/ingest.rs (+98/-37), src/mailbox_locks.rs (+13/-0), src/smtp/tests.rs (+156/-0). Build clean, clippy clean, fmt clean, new test passes locally (3.28s).

Resolved (verified against diff)

#2 DEDUP_TTL doc — src/ingest.rs:46-52. Doc-comment now spells out the RFC 5321 §4.5.4.1 4–5 day retry ceiling vs. the chosen 24h value and explicitly frames it as a tradeoff. Resolved.
#3 Empty Message-Id bypass — src/ingest.rs:357-365. Inline comment at the dedup gate documents the gap, points a future debugger at the .eml, and justifies the deliberate no-content-digest-fallback choice ("would mask broken-sender bugs that an operator probably wants to see"). Reasonable call; matches the original review's "(a) or (b)" wording. Resolved as a doc-only acknowledgement.
#4 4 KiB → 16 KiB — src/ingest.rs:683 confirms let mut buf = [0u8; 16 * 1024];. Doc-comment src/ingest.rs:671-688 explains the rationale (long References chains + mailing-list cc). Resolved.
#5 MaxItemsExceeded silence — src/ingest.rs:515-524. Inline comment now explains the regression-to-original-bug failure shape at >100k entries/24h and names DEDUP_MAX_ITEMS as the lever. tracing::warn! retained as the operator signal. Resolved.
#6 Catchall keying — src/mailbox_locks.rs:49-58. New "Per-mailbox keying, by design" module-doc section covers the alice-vs-catchall scenario and points at the migration path (process-wide cache) if cross-mailbox dedup ever becomes a requirement. Resolved.
Quality: async-spawn helper extraction — src/ingest.rs:556-598. spawn_on_receive_async(config: &Config, mailbox: &str, md_path: &Path, meta: &InboundFrontmatter) replaces the ~40-line inline block. Clones (config.clone(), mailbox.to_string(), md_path.to_path_buf(), meta.clone()) and the mailbox_name_for_log separate clone for the error path are all preserved; the spawn-failure policy (log only, no fallback to sync) is intact. Equivalent behavior to the prior inline code. Resolved.
Quality: TOML parser limits doc-comment — parse_toml_string_value (src/ingest.rs:741-749) and read_frontmatter_id_pair (src/ingest.rs:684-695) both call out the hand-rolled-TOML-subset constraints and tell the next reader to "swap this for a real toml crate call rather than expanding the regex-in-disguise" if frontmatter.rs ever grows multi-line strings. Resolved.
Test-coverage gap: async-hook detach regression — src/smtp/tests.rs:1626-1690 test_smtp_data_returns_fast_with_slow_on_receive_hook. Verified independently:
- Registers a 3s sleep hook on alice (/bin/sh -c "sleep 3 && touch <marker>").
- Drives the full SMTP DATA path: EHLO → MAIL FROM → RCPT TO → DATA → body → . terminator → QUIT. Not a direct ingest_email call.
- Asserts (1) 250 OK after dot-terminator returns in <1500ms (HOOK_SLEEP_SECS * 1000 / 2) and (2) the marker file eventually appears within HOOK_SLEEP_SECS + 10 seconds. (2) is what catches the "spawn silently dropped the closure" regression that (1) alone would miss.
- Sets AIMX_SANDBOX_FORCE_FALLBACK=1 via unsafe { set_var } so the hook runs without root / systemd-run. The var is process-wide but src/hook.rs:665 already sets it the same way, so this doesn't introduce a new pollution pattern.
- Sync would block ≥3000ms (implementer reports 3119ms with HookMode::Sync). The 1500ms vs 3000ms separation is an order-of-magnitude gap, not a tight margin — robust against CI noise.
- Belt-and-braces assert_eq!(md_files.len(), 1) at the end pins that ingestion itself still completed.
- Local run: passes in 3.28s. Resolved.

Deferred (accepted)

#1 verify_auth still blocks 250 OK — kept out of scope per the implementer's top-level comment. I agree with the deferral. The synchronous tokio::runtime::Runtime::new() plus DKIM/SPF/DMARC DNS lookups at src/ingest.rs:420-465 still gate 250 OK for several seconds on a slow recursive resolver, but the failure radius is DNS-bounded, not user-script-bounded — meaningfully different from the multi-minute hook this PR fixed. A proper fix wants its own focused PR (decide between spawn_blocking + hard timeout vs. full async detach, both with retention of trust-evaluation semantics for the frontmatter.trusted field). Track as a follow-up issue.

Still unresolved

None.

New issues introduced by the cycle 1B changes

None.

The spawn_on_receive_async extraction is a faithful refactor — the only structural change is mailbox: &str taking ownership via to_string() instead of mailbox.clone() on a String reference, which is semantically identical. All clones in the closure (config_owned, mailbox_name, mailbox_name_for_log, md_path_owned, meta_owned) are correctly owned and moved.
The unsafe { set_var } block in the new test is unavoidable on Rust 2024 edition; the safety comment is accurate.
The 16 KiB read cap is a strict superset of the prior 4 KiB behavior — no regression risk.

Scope creep check

None. Diff is exactly the 7 prior non-blockers + the requested regression test. No unrelated refactors.

Banned-token check (per `CLAUDE.local.md`)

PR-touched files (src/ingest.rs, src/mailbox_locks.rs, src/smtp/tests.rs, src/smtp/session.rs, src/hook_handler.rs, src/mailbox_handler.rs, src/state_handler.rs) introduce zero hits for \bSprint\b, \bS\d+[-.]\d+\b, \bFR-\d+[a-z]?\b, "User Story", "Acceptance criteria". (Pre-existing tokens in src/mailbox_handler.rs lines 691/1585/1767 are untouched by this PR and out of scope.)

Summary and verdict

Overall verdict: Ready to merge.

All six prior non-blockers are addressed in code or in doc-comments exactly as the review asked. The deferred #1 is justified, scoped, and the implementer has committed to a follow-up. The async-hook regression test is well-designed: it drives the full SMTP DATA path (not just ingest_email), asserts both timing AND eventual hook execution, has a 2× margin so CI noise can't flake it, and the implementer's sync-mode sanity check (3119ms vs 1500ms) confirms it actually fails under regression. No new bugs, no scope creep, no banned tokens, no test-quality concerns.

Blockers: none.
Non-blockers: none.
Nice-to-haves: none (the follow-up for verify_auth is already tracked in the implementer's comment).

Recommended merge commit message

fix: prevent duplicate hook fires on slow inbound deliveries

Two-layer fix for the "single forwarded email triggers multiple hook
fires" bug.

Layer 1 (root cause): new HookMode::{Sync, Async} parameter on
ingest_email. The SMTP DATA handler passes Async, which detaches
on_receive hooks onto std::thread::spawn so 250 OK returns in seconds
rather than after the hook completes. The aimx ingest CLI keeps Sync so
the short-lived process doesn't exit before its hook.

Layer 2 (defense in depth): per-mailbox in-memory Message-Id cache
(sparkv-backed, 24h TTL, auto-eviction). Lazily rebuilt from on-disk
frontmatter on first ingest per mailbox after daemon startup. Filename
date-prefix pre-filter keeps the rebuild cheap. Duplicate Message-Ids
within 24h are silently absorbed: no second .md, no second hook fire,
but ingest still returns Ok so the SMTP path emits 250 OK and the
remote MTA stops retrying.

Tests: 4 new dedup unit tests (cache hit, distinct ids allowed, lazy
disk rebuild, expired entries excluded) plus a new SMTP-path regression
test (test_smtp_data_returns_fast_with_slow_on_receive_hook) that pins
the timing contract by registering a 3s hook and asserting 250 OK
returns in well under that duration. Existing tests that re-ingested
the same fixture twice were updated to emit per-call unique Message-Ids.

uzyn commented May 11, 2026

View reviewed changes

uzyn merged commit a100600 into main May 11, 2026
8 checks passed

uzyn deleted the worktree-multi-bug-fix branch May 11, 2026 12:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent duplicate hook fires on slow inbound deliveries#225

fix: prevent duplicate hook fires on slow inbound deliveries#225
uzyn merged 2 commits into
mainfrom
worktree-multi-bug-fix

uzyn commented May 11, 2026 •

edited

Loading

Uh oh!

uzyn left a comment

Uh oh!

uzyn commented May 11, 2026

Uh oh!

uzyn left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

uzyn commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files changed

Test plan

Out of scope

Uh oh!

uzyn left a comment

Choose a reason for hiding this comment

Changes Overview

Scope Alignment

Potential Bugs

1. Auth verification still blocks 250 OK (Non-blocker)

2. 24h dedup TTL is shorter than the SMTP retry window (Non-blocker)

3. Empty Message-Id bypasses dedup entirely (Non-blocker)

4. read_frontmatter_id_pair only inspects the first 4 KiB (Non-blocker)

5. set_with_ttl failures are silent at the call site, with a real failure mode at scale (Non-blocker)

6. Catchall mailbox cache rebuild scope (Non-blocker)

Test Coverage

Code Quality

Summary and Recommended Actions

Uh oh!

uzyn commented May 11, 2026

Cycle 1B: Review feedback addressed

Fixed in code

Regression test added

Deferred to a follow-up PR

Verification

Uh oh!

uzyn left a comment

Choose a reason for hiding this comment

Re-Review (cycle 1B, head 24a2291)

Resolved (verified against diff)

Deferred (accepted)

Still unresolved

New issues introduced by the cycle 1B changes

Scope creep check

Banned-token check (per CLAUDE.local.md)

Summary and verdict

Recommended merge commit message

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

uzyn commented May 11, 2026 •

edited

Loading

1. Auth verification still blocks `250 OK` (Non-blocker)

3. Empty `Message-Id` bypasses dedup entirely (Non-blocker)

4. `read_frontmatter_id_pair` only inspects the first 4 KiB (Non-blocker)

5. `set_with_ttl` failures are silent at the call site, with a real failure mode at scale (Non-blocker)

Re-Review (cycle 1B, head `24a2291`)

Banned-token check (per `CLAUDE.local.md`)