Slack latency: 3× smaller worker script, append-only webhook ack, eyes at the routing hop, pre-warmed hosts by jonastemplestein · Pull Request #1494 · iterate/iterate

jonastemplestein · 2026-06-11T10:57:12Z

The problem

A Slack message in prd took ~14s to get the 👀 reaction and ~20s to get a reply (example: iterate project, thread ts-1781170058-112929). Hop-by-hop, from the message's Slack ts:

Δ	what happened
+0.9s	Slack delivered the webhook — Slack was fast
+6.5s	nothing of ours executed anywhere: cold instantiation of SlackIntegrationDO + the integration StreamDO (handler: 8.1s wall, 5ms CPU). Slack's 3s retry queued behind the same gate and doubled the work
+2.1s	integration DO init + subscription + append + routing
+3.0s	cold instantiation of the new thread StreamDO
+1.4s	cold dial of the SLACK_AGENT host DO → input rendered → eyes at ~14s
+6s	LLM leg (openai-ws connect 1.1s, gpt-5.5 ~2s, itx exec) → reply at ~20s

Two multiplying causes: the deployed script was 89.1 MB (50 MB sourcemaps + browser-only modules uploaded as worker modules by alchemy's noBundle glob over dist/server; the live server graph is ~34 MB, the entrypoint 1.75 MB) — and every cold DO isolate loads all of it — times 3–4 distinct DOs chained serially on the webhook path. The warm path was always fine (webhook 1–6ms, appends 20–100ms): this is cold-start tax, not stream-architecture tax.

The fixes (no change to the streams/processors idea)

prune-server-bundle.ts (runs between build and asset preupload): deletes every dist/server module unreachable from the entrypoint via import/new URL literals (browser web workers + their wasm that the SSR build emits), plus all sourcemaps except the entrypoint's own (small; the one Cloudflare can symbolicate worker stack traces with — chunk maps are browser code and pure ballast inside a worker script). Validated against the extracted prd bundle: keeps exactly the 186-module live graph, deletes the 3 browser-only modules + chunk maps.
Append-only webhook ack: the handler no longer awaits SlackIntegrationDO.initialize() before responding — only the durable append gates the 200; initialize + catch-up moved to waitUntil. Order-independent (existing integrations have their subscription on the stream; new ones pick the webhook up via replay). Stops the >3s Slack retry storm.
👀 at the routing hop: the slack router reports routed webhooks to its host (acknowledgeRoutedWebhook) and SlackIntegrationDO adds the reaction immediately — one hop from ingress instead of three cold DO hops downstream — gated by the same payload-only rules the slack-agent applies (no bot messages, no reaction events, no bot-user actions). slack-agent still adds it on catch-up; already_reacted makes the pair idempotent.
Pre-warmed hosts (prewarmRoutedStreamHosts): for a newly routed thread, the SLACK_AGENT and AGENT host DOs initialize() concurrently with the bootstrap append instead of serially after each dial. Everything either side appends is idempotency-keyed and order-independent (the anchor-skip recovery from agent: recover triggering inputs skipped by the side-effect anchor #1481 covers trigger ordering).

Measured

Dev-stage deploys of this branch (os-dev-jonas):

prd today: 89.1 MB
this branch pre-[codex] shrink os ssr route bundle #1486 baseline: 34.1 MB
this branch on latest main (includes [codex] shrink os ssr route bundle #1486's SSR-graph shrink, 186→178 live modules): 28.3 MB — 3.1× smaller; app smoke-tested (sign-in 200)
prune log on the real prd bundle: kept 186 modules, deleted 3 unreachable modules + 180 sourcemaps (55.0 MB)

Expected effect: each cold DO instantiation drops from multi-second to sub-second, and the eyes ack stops depending on the deepest part of the chain. Worth re-measuring the full message→eyes timing in prd after this deploys.

Trade-offs / notes

Chunk-level deployed stack traces lose symbolication (entrypoint map kept). Symbolicate locally against the build output if needed.
The prune is conservative: anything referenced by a quoted relative specifier (from, import(), export from, new URL) stays. The unreachable set on the real bundle is exactly the browser-only web workers + wasm.
Follow-up idea (not this PR): split app-vs-platform workers so UI deploys stop evicting agent/stream DOs (the 2026-06-10 deploy-race incident), and consider per-DO-class workers for deploy isolation.

🤖 Generated with Claude Code

Note

Medium Risk
Changes production Slack webhook timing, adds best-effort Slack API calls on the routing path, and alters deploy artifacts via bundle pruning; behavior is designed to be idempotent but affects a critical user-visible path.

Overview
Cuts Slack cold-path latency by shrinking the deployed worker and parallelizing work on the webhook path.

Deploy: Adds prune-server-bundle to the Alchemy build (after Vite, before asset preupload). It strips unreachable dist/server modules and most sourcemaps so each cold Durable Object isolate loads a much smaller script.

Webhook ingress: The Slack webhook handler now returns { ok: true } after the durable stream append only; SlackIntegrationDO.initialize() / ensureReady() run in waitUntil, avoiding >3s acks and Slack retries.

Routing hop: SlackProcessor gains optional acknowledgeRoutedWebhook and prewarmRoutedStreamHosts. The integration DO adds the 👀 reaction at route time (via eyesReactionTargetFromWebhookPayload + reactions.add) and pre-initializes SLACK_AGENT and AGENT DOs in parallel with new-thread bootstrap. Downstream slack-agent behavior stays idempotent (already_reacted).

^{Reviewed by Cursor Bugbot for commit 8cf05b1. Bugbot is set up for automated code reviews on this repo. Configure here.}

Environment Config Lease

No active environment config lease.

OS

Status: released
Commit: 8cf05b1
Preview: https://os.iterate-preview-3.com
Summary: Preview app released.
Workflow run
Updated: 2026-06-11T12:53:44.296Z

Semaphore

Status: released
Commit: 8cf05b1
Preview: https://semaphore.iterate-preview-3.com
Summary: Preview app released.
Workflow run
Updated: 2026-06-11T12:53:35.576Z

…e worker script The deployed os-prd script measured 89.1 MB: alchemy's noBundle upload globs everything under dist/server, which includes 50 MB of sourcemaps and browser-only modules (web workers, wasm) the Vite SSR build emits but the server graph never imports. Cold Durable Object isolates pay for total script size, and the Slack webhook path chains 3-4 DOs — measured in prd as ~6.5s + 3.0s + 1.4s of sequential DO cold starts (14s from message to the eyes reaction, 5ms CPU). prune-server-bundle.ts runs between build and asset preupload: deletes every module unreachable from the entrypoint via import/new URL literals, plus all sourcemaps except the entrypoint's own (small, and the one Cloudflare can use to symbolicate worker stack traces; chunk maps are mostly browser code and pure ballast inside a worker script). Validated against the extracted prd bundle: keeps exactly the 186 live modules, deletes the 3 browser-only ones and the chunk maps (~52 MB). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…pre-warmed hosts Three serial cold-start legs stood between a Slack message and any visible response in prd: 1. The webhook handler awaited SlackIntegrationDO.initialize() before appending, serializing a cold DO ahead of Slack's 200 — observed at 8s, with Slack's 3s retry queueing behind the same gate. Now only the durable append gates the response; initialize + catch-up run in waitUntil. Order-independent: existing integrations already have their subscription on the stream, and a new integration picks the webhook up via replay once the subscription lands. 2. The eyes reaction lived in the slack-agent processor, three DO cold starts downstream. The router now reports routed webhooks to the host (acknowledgeRoutedWebhook) and SlackIntegrationDO adds the reaction immediately, gated by the same payload-only rules the slack-agent applies (no bot messages, no reaction events, no bot-user actions). The slack-agent still adds it on catch-up; already_reacted makes the pair idempotent. 3. A newly routed thread stream cold-started serially: stream DO, then its dial woke the slack-agent host, then the agent host. The router now pre-warms both hosts (prewarmRoutedStreamHosts) concurrently with the bootstrap append. Everything either side appends is idempotency-keyed and order-independent. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

jonastemplestein · 2026-06-11T11:04:35Z

Verification record

Script size (the cold-start tax every DO isolate pays):

prd today: 89.1 MB
this branch deployed to os-dev-jonas and CI's os-preview-3: 28.3 MB (3.1× smaller; both measured via the Workers Scripts API)
prune log on this branch (after [codex] shrink os ssr route bundle #1486's SSR shrink): kept 178 modules, deleted 3 unreachable modules + 172 sourcemaps (35.1 MB); the entrypoint's index.js.map is kept for stack-trace symbolication
prune validated against the extracted real prd bundle: keeps exactly the 186-module live graph, deletes only the 3 browser-only modules (itx-repl-typescript.worker, stream-db.worker, wa-sqlite.wasm) + chunk maps

E2E against preview-3 with the real Slack bot token (exercises the new acknowledgeRoutedWebhook / prewarmRoutedStreamHosts / append-only-ack paths):

✅ schedules and completes an LLM request for a plain routed Slack message (15.4s)
✅ routes Slack webhooks into slack-agent streams and executes bang command replies (13.4s)
✅ lets a real agent conversation post to Slack through codemode (14.6s)

Dev smoke: os.iterate-dev-jonas.com/sign-in 200 on the pruned bundle. All CI checks green, no Bugbot findings.

After this deploys to prd, worth re-measuring message→👀 on a cold thread (was 14s; the eyes now depend only on the first hop, and each cold DO start should drop from seconds to sub-second).

🤖 Generated with Claude Code

jonastemplestein · 2026-06-11T11:41:20Z

Measured latency traces on this PR's preview (os-preview-3, 28.3 MB script)

Method: real root message posted to #slack-agent-e2e-test with the preview bot token, then a human-shaped webhook injected into the test project's /integrations/slack stream (same entry as the e2e suite — skips only the HTTP signature/D1-lookup leg, which is unchanged and was never the problem). 👀 timed by polling Slack reactions.get at 150ms; everything else from server-side stream createdAts. Cold runs after 12–14 min idle; warm runs immediately after, on a new thread (the realistic steady state: hot isolates, fresh per-thread DOs).

milestone (from webhook injection)	prd baseline (89 MB, old code)	cold (preview, this PR)	warm new thread (preview, this PR)
webhook durably appended	~8,100ms (handler held by cold DO chain; Slack retried)	2,037ms	146ms
👀 visible in Slack	~14,000ms	2,758ms	556–564ms
thread stream created	+11,600ms	3,571ms	1,236ms
slack-agent connected → input rendered	+14,000ms	3,831ms	1,593ms
`llm-request-scheduled`	~15,800ms (or never, pre-#1481)	4,935ms	2,580–2,674ms
LLM turn completed	~19,200ms	9,706ms	5,745–6,444ms

Reproduced twice cold (second run: eyes at ≈+1.5s after webhook commit, route-configured at +1.28s) — consistent. Observations:

The eyes now fire at the routing hop (route-configured at +1.3s after commit), so they no longer wait for the thread-stream + slack-agent chain at all.
The pre-warm works: slack-agent connects 190–230ms after thread-stream creation (was a serial ~1.4s cold dial in prd).
The dominant remaining cold cost is one ~2s cold start of the integration-stream DO inside the append, plus ~1.2s for each subsequent first-touch DO — that's the 28 MB isolate load; was 6.5s/3.0s/1.4s at 89 MB. Further shrinking the live server graph (more [codex] shrink os ssr route bundle #1486-style SSR pruning) buys this down linearly.
Warm, the platform overhead from webhook to LLM trigger is ~2.6s, of which ~1.1s is the openai-ws connect and ~1s the agent-host bootstrap — the streams architecture itself costs a few hundred ms.

🤖 Generated with Claude Code

jonastemplestein and others added 2 commits June 11, 2026 11:53

jonastemplestein merged commit 258a894 into main Jun 11, 2026
8 checks passed

jonastemplestein deleted the ahead-nautilus branch June 11, 2026 12:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slack latency: 3× smaller worker script, append-only webhook ack, eyes at the routing hop, pre-warmed hosts#1494

Slack latency: 3× smaller worker script, append-only webhook ack, eyes at the routing hop, pre-warmed hosts#1494
jonastemplestein merged 2 commits into
mainfrom
ahead-nautilus

jonastemplestein commented Jun 11, 2026 •

edited by iterate-bot

Loading

Uh oh!

jonastemplestein commented Jun 11, 2026

Uh oh!

jonastemplestein commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jonastemplestein commented Jun 11, 2026 • edited by iterate-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The problem

The fixes (no change to the streams/processors idea)

Measured

Trade-offs / notes

Environment Config Lease

OS

Semaphore

Uh oh!

jonastemplestein commented Jun 11, 2026

Uh oh!

jonastemplestein commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jonastemplestein commented Jun 11, 2026 •

edited by iterate-bot

Loading