Skip to content

Slack latency: 3× smaller worker script, append-only webhook ack, eyes at the routing hop, pre-warmed hosts#1494

Merged
jonastemplestein merged 2 commits into
mainfrom
ahead-nautilus
Jun 11, 2026
Merged

Slack latency: 3× smaller worker script, append-only webhook ack, eyes at the routing hop, pre-warmed hosts#1494
jonastemplestein merged 2 commits into
mainfrom
ahead-nautilus

Conversation

@jonastemplestein

@jonastemplestein jonastemplestein commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

The problem

A Slack message in prd took ~14s to get the 👀 reaction and ~20s to get a reply (example: iterate project, thread ts-1781170058-112929). Hop-by-hop, from the message's Slack ts:

Δ what happened
+0.9s Slack delivered the webhook — Slack was fast
+6.5s nothing of ours executed anywhere: cold instantiation of SlackIntegrationDO + the integration StreamDO (handler: 8.1s wall, 5ms CPU). Slack's 3s retry queued behind the same gate and doubled the work
+2.1s integration DO init + subscription + append + routing
+3.0s cold instantiation of the new thread StreamDO
+1.4s cold dial of the SLACK_AGENT host DO → input rendered → eyes at ~14s
+6s LLM leg (openai-ws connect 1.1s, gpt-5.5 ~2s, itx exec) → reply at ~20s

Two multiplying causes: the deployed script was 89.1 MB (50 MB sourcemaps + browser-only modules uploaded as worker modules by alchemy's noBundle glob over dist/server; the live server graph is ~34 MB, the entrypoint 1.75 MB) — and every cold DO isolate loads all of it — times 3–4 distinct DOs chained serially on the webhook path. The warm path was always fine (webhook 1–6ms, appends 20–100ms): this is cold-start tax, not stream-architecture tax.

The fixes (no change to the streams/processors idea)

  1. prune-server-bundle.ts (runs between build and asset preupload): deletes every dist/server module unreachable from the entrypoint via import/new URL literals (browser web workers + their wasm that the SSR build emits), plus all sourcemaps except the entrypoint's own (small; the one Cloudflare can symbolicate worker stack traces with — chunk maps are browser code and pure ballast inside a worker script). Validated against the extracted prd bundle: keeps exactly the 186-module live graph, deletes the 3 browser-only modules + chunk maps.
  2. Append-only webhook ack: the handler no longer awaits SlackIntegrationDO.initialize() before responding — only the durable append gates the 200; initialize + catch-up moved to waitUntil. Order-independent (existing integrations have their subscription on the stream; new ones pick the webhook up via replay). Stops the >3s Slack retry storm.
  3. 👀 at the routing hop: the slack router reports routed webhooks to its host (acknowledgeRoutedWebhook) and SlackIntegrationDO adds the reaction immediately — one hop from ingress instead of three cold DO hops downstream — gated by the same payload-only rules the slack-agent applies (no bot messages, no reaction events, no bot-user actions). slack-agent still adds it on catch-up; already_reacted makes the pair idempotent.
  4. Pre-warmed hosts (prewarmRoutedStreamHosts): for a newly routed thread, the SLACK_AGENT and AGENT host DOs initialize() concurrently with the bootstrap append instead of serially after each dial. Everything either side appends is idempotency-keyed and order-independent (the anchor-skip recovery from agent: recover triggering inputs skipped by the side-effect anchor #1481 covers trigger ordering).

Measured

Dev-stage deploys of this branch (os-dev-jonas):

Expected effect: each cold DO instantiation drops from multi-second to sub-second, and the eyes ack stops depending on the deepest part of the chain. Worth re-measuring the full message→eyes timing in prd after this deploys.

Trade-offs / notes

  • Chunk-level deployed stack traces lose symbolication (entrypoint map kept). Symbolicate locally against the build output if needed.
  • The prune is conservative: anything referenced by a quoted relative specifier (from, import(), export from, new URL) stays. The unreachable set on the real bundle is exactly the browser-only web workers + wasm.
  • Follow-up idea (not this PR): split app-vs-platform workers so UI deploys stop evicting agent/stream DOs (the 2026-06-10 deploy-race incident), and consider per-DO-class workers for deploy isolation.

🤖 Generated with Claude Code


Note

Medium Risk
Changes production Slack webhook timing, adds best-effort Slack API calls on the routing path, and alters deploy artifacts via bundle pruning; behavior is designed to be idempotent but affects a critical user-visible path.

Overview
Cuts Slack cold-path latency by shrinking the deployed worker and parallelizing work on the webhook path.

Deploy: Adds prune-server-bundle to the Alchemy build (after Vite, before asset preupload). It strips unreachable dist/server modules and most sourcemaps so each cold Durable Object isolate loads a much smaller script.

Webhook ingress: The Slack webhook handler now returns { ok: true } after the durable stream append only; SlackIntegrationDO.initialize() / ensureReady() run in waitUntil, avoiding >3s acks and Slack retries.

Routing hop: SlackProcessor gains optional acknowledgeRoutedWebhook and prewarmRoutedStreamHosts. The integration DO adds the 👀 reaction at route time (via eyesReactionTargetFromWebhookPayload + reactions.add) and pre-initializes SLACK_AGENT and AGENT DOs in parallel with new-thread bootstrap. Downstream slack-agent behavior stays idempotent (already_reacted).

Reviewed by Cursor Bugbot for commit 8cf05b1. Bugbot is set up for automated code reviews on this repo. Configure here.

Environment Config Lease

No active environment config lease.

OS

Status: released
Commit: 8cf05b1
Preview: https://os.iterate-preview-3.com
Summary: Preview app released.
Workflow run
Updated: 2026-06-11T12:53:44.296Z

Semaphore

Status: released
Commit: 8cf05b1
Preview: https://semaphore.iterate-preview-3.com
Summary: Preview app released.
Workflow run
Updated: 2026-06-11T12:53:35.576Z

jonastemplestein and others added 2 commits June 11, 2026 11:53
…e worker script

The deployed os-prd script measured 89.1 MB: alchemy's noBundle upload
globs everything under dist/server, which includes 50 MB of sourcemaps
and browser-only modules (web workers, wasm) the Vite SSR build emits
but the server graph never imports. Cold Durable Object isolates pay
for total script size, and the Slack webhook path chains 3-4 DOs —
measured in prd as ~6.5s + 3.0s + 1.4s of sequential DO cold starts
(14s from message to the eyes reaction, 5ms CPU).

prune-server-bundle.ts runs between build and asset preupload: deletes
every module unreachable from the entrypoint via import/new URL
literals, plus all sourcemaps except the entrypoint's own (small, and
the one Cloudflare can use to symbolicate worker stack traces; chunk
maps are mostly browser code and pure ballast inside a worker script).
Validated against the extracted prd bundle: keeps exactly the 186 live
modules, deletes the 3 browser-only ones and the chunk maps (~52 MB).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…pre-warmed hosts

Three serial cold-start legs stood between a Slack message and any
visible response in prd:

1. The webhook handler awaited SlackIntegrationDO.initialize() before
   appending, serializing a cold DO ahead of Slack's 200 — observed at
   8s, with Slack's 3s retry queueing behind the same gate. Now only
   the durable append gates the response; initialize + catch-up run in
   waitUntil. Order-independent: existing integrations already have
   their subscription on the stream, and a new integration picks the
   webhook up via replay once the subscription lands.
2. The eyes reaction lived in the slack-agent processor, three DO cold
   starts downstream. The router now reports routed webhooks to the
   host (acknowledgeRoutedWebhook) and SlackIntegrationDO adds the
   reaction immediately, gated by the same payload-only rules the
   slack-agent applies (no bot messages, no reaction events, no
   bot-user actions). The slack-agent still adds it on catch-up;
   already_reacted makes the pair idempotent.
3. A newly routed thread stream cold-started serially: stream DO, then
   its dial woke the slack-agent host, then the agent host. The router
   now pre-warms both hosts (prewarmRoutedStreamHosts) concurrently
   with the bootstrap append. Everything either side appends is
   idempotency-keyed and order-independent.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@jonastemplestein

Copy link
Copy Markdown
Contributor Author

Verification record

Script size (the cold-start tax every DO isolate pays):

  • prd today: 89.1 MB
  • this branch deployed to os-dev-jonas and CI's os-preview-3: 28.3 MB (3.1× smaller; both measured via the Workers Scripts API)
  • prune log on this branch (after [codex] shrink os ssr route bundle #1486's SSR shrink): kept 178 modules, deleted 3 unreachable modules + 172 sourcemaps (35.1 MB); the entrypoint's index.js.map is kept for stack-trace symbolication
  • prune validated against the extracted real prd bundle: keeps exactly the 186-module live graph, deletes only the 3 browser-only modules (itx-repl-typescript.worker, stream-db.worker, wa-sqlite.wasm) + chunk maps

E2E against preview-3 with the real Slack bot token (exercises the new acknowledgeRoutedWebhook / prewarmRoutedStreamHosts / append-only-ack paths):

  • schedules and completes an LLM request for a plain routed Slack message (15.4s)
  • routes Slack webhooks into slack-agent streams and executes bang command replies (13.4s)
  • lets a real agent conversation post to Slack through codemode (14.6s)

Dev smoke: os.iterate-dev-jonas.com/sign-in 200 on the pruned bundle. All CI checks green, no Bugbot findings.

After this deploys to prd, worth re-measuring message→👀 on a cold thread (was 14s; the eyes now depend only on the first hop, and each cold DO start should drop from seconds to sub-second).

🤖 Generated with Claude Code

@jonastemplestein

Copy link
Copy Markdown
Contributor Author

Measured latency traces on this PR's preview (os-preview-3, 28.3 MB script)

Method: real root message posted to #slack-agent-e2e-test with the preview bot token, then a human-shaped webhook injected into the test project's /integrations/slack stream (same entry as the e2e suite — skips only the HTTP signature/D1-lookup leg, which is unchanged and was never the problem). 👀 timed by polling Slack reactions.get at 150ms; everything else from server-side stream createdAts. Cold runs after 12–14 min idle; warm runs immediately after, on a new thread (the realistic steady state: hot isolates, fresh per-thread DOs).

milestone (from webhook injection) prd baseline (89 MB, old code) cold (preview, this PR) warm new thread (preview, this PR)
webhook durably appended ~8,100ms (handler held by cold DO chain; Slack retried) 2,037ms 146ms
👀 visible in Slack ~14,000ms 2,758ms 556–564ms
thread stream created +11,600ms 3,571ms 1,236ms
slack-agent connected → input rendered +14,000ms 3,831ms 1,593ms
llm-request-scheduled ~15,800ms (or never, pre-#1481) 4,935ms 2,580–2,674ms
LLM turn completed ~19,200ms 9,706ms 5,745–6,444ms

Reproduced twice cold (second run: eyes at ≈+1.5s after webhook commit, route-configured at +1.28s) — consistent. Observations:

  • The eyes now fire at the routing hop (route-configured at +1.3s after commit), so they no longer wait for the thread-stream + slack-agent chain at all.
  • The pre-warm works: slack-agent connects 190–230ms after thread-stream creation (was a serial ~1.4s cold dial in prd).
  • The dominant remaining cold cost is one ~2s cold start of the integration-stream DO inside the append, plus ~1.2s for each subsequent first-touch DO — that's the 28 MB isolate load; was 6.5s/3.0s/1.4s at 89 MB. Further shrinking the live server graph (more [codex] shrink os ssr route bundle #1486-style SSR pruning) buys this down linearly.
  • Warm, the platform overhead from webhook to LLM trigger is ~2.6s, of which ~1.1s is the openai-ws connect and ~1s the agent-host bootstrap — the streams architecture itself costs a few hundred ms.

🤖 Generated with Claude Code

@jonastemplestein jonastemplestein merged commit 258a894 into main Jun 11, 2026
8 checks passed
@jonastemplestein jonastemplestein deleted the ahead-nautilus branch June 11, 2026 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant