Class-model stream processors across apps/os: DO-hosted, callable subscriptions, legacy model deleted#1402
Merged
Conversation
b8cfa09 to
29527a0
Compare
Contributor
Author
Final verification (head
|
…n event filtering - subscription-configured subscribers are now Callable descriptors; the Stream DO dials them via dispatchCallable (packages/streams now depends on packages/shared). Old built-in/workers-rpc subscriber shapes are tolerated historically but no longer dialed. - createStreamProcessorHost: hosts named class-based processors on any DO as plain fields, with checkpoint storage in DO KV, late-bound stream context, per-subscription side-effect anchor, and host-level processor-registered announcement (replacing standardProcessorBehavior). - StreamProcessor gains sideEffectsAfterOffset: events at or below the anchor reduce into state but skip processEvent — attaching to an existing stream rebuilds state without re-firing historical side effects. - subscribe/subscribeOutbound accept eventTypes; the pump filters post-read while the cursor advances past non-matching events. Hosts always pass contract.consumes — the contract is the filter. - echo + circuit-breaker migrated to StreamProcessor classes; staging runner DO rewritten on the host helper; example-app call sites emit callable subscribers; node e2e rewritten on the class model with consumes filtering. Design log: apps/os/tasks/stream-processor-class-migration-log.md Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
stream-processor.test.ts now drives EchoExampleProcessor/CircuitBreakerProcessor through ingest (replay dedupe, resume-from-snapshot, side-effect anchor, the CoreStreamSim pause flow); circuit-breaker and core contract tests updated for the callable subscriber shape; stream-processor-failures.test.ts (legacy runner semantics) deleted — class failure paths are covered in stream-processor-class.test.ts. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
ProjectLifecycleProcessor and RepoStreamProcessor become StreamProcessor subclasses hosted on their domain DOs via createStreamProcessorHost; the subscription-configured events switch to durableObjectProcessorSubscriber callables (PROJECT/REPO bindings) with :callable idempotency-key suffixes so they land on existing streams. Repo catch-up waits now target the latest consumed event instead of the stream head, since delivery is filtered by contract.consumes. One-line lockfile fix pins packages/streams zod to 4.3.6 to match the rest of the workspace (zod brands types per minor version, so the 4.4.3/4.3.6 split broke every OS-defined contract against the streams package's generics). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- CodemodeProcessor (StreamProcessor subclass) + wire-identical contract in apps/os/src/domains/codemode/stream-processors/codemode; reducer moved into the class, standardProcessorBehavior dropped per D11 - session-started blocks the checkpoint (blockProcessorWhile); script execution stays detached (runInBackground), matching the legacy OS runner's detachedSideEffects wiring - CodemodeSession hosts the processor via createStreamProcessorHost, exposes requestStreamSubscription, and appends a durableObjectProcessorSubscriber callable (idempotency key gains :callable so it lands on old streams) - getRunnerState/waitForProcessorCatchUp/resolveRegisteredProvider rewired onto host.runtimeState + the live processor; catch-up targets the last consumed-type event because delivery is contract-filtered (D9) - pnpm-lock: pin packages/streams to zod@4.3.6 so contracts authored in apps/os type-flow through the streams machinery (cross-zod-instance schemas otherwise degrade to unknown) - codemode-session suite: 16/17 (the loopback test waits on the unmigrated agents domain; see apps/os/tasks/migration-notes/codemode.md) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
routedStreamBootstrapEvents still appended the legacy built-in codemode subscriber (flagged in the codemode migration notes); a late legacy append under the same subscription key would never be dialed. Now uses the CODEMODE_SESSION callable with a :callable idempotency key. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…dule The class model and browser store imported it from processor-runner.ts, which is scheduled for deletion with the legacy runner. It now lives in types.ts; processor-runner re-exports it until it goes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…n class-model migration AgentDurableObject's instance-wake hook awaited waitForAgentProcessorsCatchUp inside blockConcurrencyWhile. With processors co-hosted on the same DO, the Stream DO's subscription handshake and event delivery are inbound calls that the closed input gate queues, so the polled local checkpoints could never advance and the wait burned its full 5s on every instance wake (legacy was safe only because the runner was a separate DO polled via outbound RPC). The wake hook no longer waits; public methods await a memoized once-per-wake catch-up after ensureStarted(), outside the gate. Fixes the codemode-session workerd test 'runs loopback RPC capability examples with live handles and callbacks' (ctx.agents.create().sendMessage() hit the stall and pushed completion past the test's 5s deadline). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…lare:workers agents.e2e.test.ts imported getSlackIntegrationDurableObjectName from the DO module, dragging cloudflare:workers into the Node vitest graph and failing the whole preview e2e suite at collection. The helper now lives in slack-naming.ts (pure), re-exported from the DO for worker-side callers. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…l pin The two stream-browser specs that failed in CI (large-streams tail row never visible; heavy-append snap-back to distance 0) reproduce identically on main under 6x CPU throttling — they are pre-existing tail-anchoring races, not a regression from the processor class migration. Root cause, both in use-initial-tail-scroll.ts: 1. The pin "settled" on a 250ms count-quiescence timer and programmatic scrollTop writes never set userLeftTail, so a late rAF-batched SQLite invalidation could snap a scrolled-away viewport back to the end. 2. A >250ms stall mid-replay released the pin while rows were still streaming; afterwards TanStack followOnAppend alone held the tail, but it only re-engages within scrollEndThreshold (80px) and can resolve its reconcile target against a pre-commit scrollHeight, undershooting by ~2px per newly windowed row (38px estimate vs 40px measured ≈ 88px per window) — enough to silently break the follow chain and strand the viewport mid-stream. The pin now holds until the user actually leaves the tail — additionally detected from any scroll with a scrollTop decrease AND a distance-from-end increase, which catches programmatic scrolls and scrollbar drags while staying immune to appends and TanStack's above-viewport resize adjustments — and the settle timer re-arms and re-scrollToEnd()s until the real DOM distance from the end is within 2px. settledInitialEndScroll only gates unread-badge suppression. No wire-format or subscription changes. Verified: both specs pass under 6x/8x/10x CPU throttle (previously failed at 6x on this branch and on main) and 5x consecutively unthrottled; the full 26-spec stream-browser suite is green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…n dials Two preview-breaking bugs found via the failing e2e suite: 1. The core processor's stream/created side effect (announce this stream to every ancestor so childPaths/listing work) ran synchronously inside #appendBatchHere before the new core state was committed, so #resolveStream dialed "uninitialized:/..." durable objects instead of the real ancestors. Core side effects now run post-commit with the rest of the fan-out. 2. routedStreamBootstrapEvents makes the routed stream dial a CodemodeSession nothing has initialized; the codemode processor's session-started gate read this.name, threw NotInitializedError mid-ingest, and the host-swallowed failure made the live connection skip those events permanently (lost tool providers, dead bang-command scripts). requestStreamSubscription now initializes from the runtime name first, like slack-integration/slack-agent. Regression tests in project-ingress (ancestor childPaths) and codemode-session (bootstrap-dialed session executes scripts); migration log I7/I8 document the diagnosis. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…in CI The large-streams spec fails only in CI (viewport stranded mid-list after replay; [data-index='1501'] never rendered) and cannot be reproduced locally even at 10x CPU throttle + 150ms CDP network latency against the deployed staging worker. The current hook design means a mid-list strand implies the tail pin released, so log the release transition (with scroll metrics) and the settle transition; Playwright traces capture console output. Also upload test-results as a workflow artifact on failure, and add env-gated CDP throttling (E2E_CPU_THROTTLE / E2E_NET_LATENCY_MS) to the spec for local CI-condition reproduction. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The large-streams streams-e2e spec failed on every CI run while passing locally under any throttling. The CI trace (captured via the new artifact upload + release breadcrumb) showed the pin's scroll-away heuristic firing ~1.2s into replay with no input event: TanStack's scroll-reconcile snapped scrollTop from the real DOM bottom (28723) back to its own end target (28711) — the virtualizer's end is short of the DOM bottom by the height of non-virtualized chrome inside the scroller, ~12px on CI Linux font metrics vs <=2px on macOS — and one coalesced scroll event combined that -12px write with a +3876px append burst. 'scrollTop down AND distance-from-end up' is exactly the heuristic's user-left-tail signature, so it released the pin and stranded the viewport at rows ~808-878 of 1502. Under scroll-event coalescing the virtualizer's own convergence writes are indistinguishable from a user scrolling away, so the delta heuristic is removed rather than tuned: the pin now releases only on real input signals (wheel/pointerdown/touchmove/keydown) or an explicit markUserLeftTail(). The e2e scroll helpers dispatch a synthetic wheel event before programmatic scrollTop writes — the same signal a real user produces — preserving the heavy-append spec's no-snap-back guarantee through the input path. Verified: 26-spec suite green locally; both tail-pin specs 5x green under 6x CPU throttle + 100ms CDP latency; full suite 3x green from the Linux Playwright container against a deployed scratch worker (modulo the two sqlite3-CLI specs, which the image cannot run). Extends I6 in the migration log. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The migrated processor preserved a legacy weakness Bugbot flagged: any tracked request status — including "started" — short-circuited redelivered agent/llm-request-requested events, so a crash between llm-request-started and completion left the agent stuck in a requested phase forever. Now only "completed" skips, matching the OpenAI WebSocket processor; the started append is idempotency-keyed so retries cannot duplicate it. Regression tests cover fresh execution, started-retry, and completed-skip. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
main's itx work (#1407) landed importing the StreamProcessorRunner type this branch deletes; the import was unused. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
7c60e9d to
4997b54
Compare
… event Two Bugbot findings on the rebased head: requestStreamSubscription on AgentDurableObject delegated straight to the processor host, so a cold instance could accept a subscription handshake before its wake hook seeded the agent's own subscriptions and setup events. It now ensures initialization from the runtime name first, matching the other host DOs. ensureAgentRunnerForOwnStream triggered only on stream/created, but on routed streams that event predates the side-effect anchor (the subscription's own offset), so the anchor-gated trigger never fired and the agent wake silently depended on other paths. agent-host now ensures its agent once per DO incarnation on the first delivered event of any type — initialize() is idempotent, and any live event implies activity worth waking the agent for. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…reconciliation Blocking the batch queue under blockProcessorWhile for the whole LLM round-trip meant a cancellation, superseding input, or config change sitting in the next batch could not even be reduced until the request it should affect finished. Both providers already guard agent-visible appends with a still-current check against re-read stream history, so execution now runs via runInBackground (keep-alive-backed through the host's ctx.waitUntil). Crash recovery moves from checkpoint-held redelivery to reconciliation: each provider tracks executed llmRequestIds in an instance-level Set (the DO instance is the execution scope), and a processEventBatch override re-executes any post-batch 'started' entry whose id the instance never owned, recovering the original agent/llm-request-requested event from stream history (offset === llmRequestId). Stale recoveries finish as no-ops via the existing staleness semantics; the skip-only-completed guard from c00cf72 still covers checkpoint-not-advanced replays. openai-ws serializes executions on a promise chain so concurrent requests cannot interleave reads on the shared Responses WebSocket iterator — only executions queue, never the batch queue. New tests pin the unblocked queue (a second batch reduces + checkpoints while a request is in flight) and dangling-started re-execution on both providers; existing tests gain waitFor for the new background timing. D12 in the migration log documents the decision. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 4d30dbd. Configure here.
mmkal
added a commit
that referenced
this pull request
Jun 10, 2026
The merge followed the stream-tui/router file moves into packages/iterate and applied main's content changes, which broke module resolution there (not caught because the root typecheck script excludes the iterate package): - event-stream-terminal.tsx now imports @iterate-com/streams (moved out of @iterate-com/shared by #1402) — declare the workspace dependency. - router.ts gained main's artifacts.seed-config-base procedure; the seed script stayed in apps/os/scripts, so import it from there. This router is only loaded from a repo checkout, so the cross-package import is fine. - artifacts.ts re-exported a sibling file through the ~/ alias, which broke when pulled into the iterate package's type program — use a relative path. - Annotate TuiEventsStreamView's return type so declaration emit doesn't trip over the non-portable .pnpm @types/react reference (TS2883). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jonastemplestein
added a commit
that referenced
this pull request
Jun 10, 2026
…capnweb pointers, fix task states (#1432) Documentation sweep over `apps/os`. Every statement written into a doc was verified against the code on this branch. ## Changes **`apps/os/README.md` (= `AGENTS.md`)** - Important Files: `src/app.ts` / `src/entry.workerd.ts` do not exist — replaced with `src/worker.ts` (Worker entrypoint) and `src/config.ts` (`AppConfig` schema). All other listed files verified to exist. - Real-worker tests: the documented vitest configs (`src/capnweb/e2e/vitest.config.ts`, `src/domains/capability-prototype/e2e.vitest.config.ts`) are gone — replaced with the real lanes `pnpm e2e` (`e2e/vitest.config.ts`) and `pnpm e2e:itx` (`src/itx/e2e/vitest.config.ts`), verified against `apps/os/package.json`. - `pnpm cf:deploy # production deploy` was wrong and dangerous: `cf:deploy` deploys to whatever Doppler/Alchemy stage is ambient. Now documents both `cf:deploy` (ambient stage) and `pnpm deploy` (the `doppler --config prd` wrapper). - Removed the nonexistent `/org/:organizationSlug` route; remaining routes verified against `src/routes/`; added `/new-project`. **`apps/os/CONTEXT.md`** — fixed the example-dialogue claim that organization UI lives under `/org/:organizationSlug` (no such route; orgs live in the auth worker). **`apps/os/docs/architecture-and-operations.md`** — rewritten. The old doc described the pre-migration world: Clerk auth (whole `## Clerk` section, `sync-clerk-apps.ts`, `APP_CONFIG_CLERK__*`), `/orgs/:organizationSlug` route maps, inbound MCP via `ProjectMcpServerEntrypoint` (now a hardcoded 410 tombstone), wrong redirect claims, and an unprefixed `/durable-objects/stream` debug route. The new doc describes current reality: `src/worker.ts` dispatch pipeline, Iterate Auth middleware, real route map and root-redirect behavior (`/` → `/projects/$projectSlug` or `/projects`; project root renders `ProjectHomePage`), canonical MCP endpoint from `APP_CONFIG_MCP__BASE_URL` with Iterate Auth protected-resource metadata, `/__durable-objects/<kind>/<name>/<path>` debug proxy (kinds verified), itx endpoints, `scripts/sync-auth-clients.ts`, current codemode default/example providers, and current smoke-test env vars (verified in the e2e test files). **`apps/os/docs/headless-local-debugging.md`** — `/projects/new` → the real route `/new-project`. **`apps/os/docs/iterate-context.md`, `iterate-context-learnings.md`** — both pointed at the deleted `src/capnweb/` tree as "the current design"; now short tombstones pointing at the successor (`src/itx/` README + DECISIONS, `docs/itx-spec.md`). **`apps/os/docs/capability-system-research-and-design-notes.md`, `rpc-target-constructor-shape-research.md`** — added status headers marking them historical research notes superseded by itx; bodies untouched. **`apps/os/src/itx/README.md` + `src/itx/handle.ts`** — the "Typed caps" `ProjectCaps` declaration-merging pattern does not exist in code (no `ProjectCaps` interface anywhere). Rewrote the README section to the thing that actually works: casting `itx.cap("name")` through the exported `Stubify<T>` type. Also fixed the same false claim in the `Stubify` doc comment in `handle.ts` (comment-only change). **`apps/os/docs/itx-spec.md`** — status header said "IMPLEMENTED on the `itx-implementation` branch"; PR #1407 is merged to main (verified in git history). Marked the one known divergence honestly: the §6.3 client reconnect loop was never built — `connectItx` (`src/itx/client.ts`) is one-shot, and there is no `itx.cap.disconnected` event. Corrected §6.3 and the related §4 caveat. **`apps/os/tasks/`** - Deleted `simplify-context-cloudflare-native.md` (state: todo, but shipped — `src/worker.ts` imports `env` from `cloudflare:workers` directly, `RequestContext` is the narrow request-scoped shape the task specified, auth lives in Start request middleware, the manifest/`src/app.ts` is gone). - Deleted `project-egress-secrets-mvp.md` (state: todo, but shipped — `ProjectEgress` entrypoint, `ProjectDurableObject.egressFetch` with `substituteProjectEgressSecretHeaders`, D1-backed `SecretsCapability.getSecret`, and the `/api/itx/egress-echo` echo proof covered by `src/itx/e2e/itx-egress.e2e.test.ts`). - Grooming rules (`docs/tasks-grooming.md`) say "Delete when done", so deletion rather than state edits. - Added brief status notes (no rewrite) to `codemode-session-vertical-slice.md` (checked-off "tiny worker" box diverged: `CodemodeSession` lives in the main OS worker) and `codemode-session-night-plan.md` (plan superseded by itx). ## Skipped - Nothing skipped; all nine items verified and addressed. ## Flags for reviewers - `src/itx/handle.ts` got a comment-only edit (the `Stubify` doc comment made the same false declaration-merging claim as the README). No runtime change; typecheck/lint/tests pass. - The two deleted task files: please sanity-check the "shipped" verdicts above if you have more context on intended remaining scope. - Carve-outs respected: no changes to the streams type systems or to how the os-streams worker is deployed. ## Checks - `pnpm install`, `pnpm format` (oxfmt), `pnpm typecheck`, `pnpm lint`, `pnpm test` — all pass. ## Task-file audit A follow-up commit deletes 22 task files whose work was verified as shipped, obsolete, or purely historical. (Two more from the audit — `apps/os/tasks/project-egress-secrets-mvp.md` and `apps/os/tasks/simplify-context-cloudflare-native.md` — were already deleted by earlier commits on this branch, see above.) ### Deleted: completed - `tasks/cf-prd-orphaned-resources-cleanup.md` — live Cloudflare API check of the prd account (2026-06-10) shows 14 worker scripts (was 1026 at the task's 2026-05-18 sweep) and 6 D1 databases; cleanup is done. - `tasks/complete/2026-05-22-os-captun-worker-test-tunnel.md` — shipped via merged PR #1361 ("codemode++ e2e++"); all described artifacts exist on main and survived the golden-path rebuild (#1411). - `tasks/dead-code-and-docs-cleanup-audit.md` — high-confidence items all shipped; `pnpm-workspace.yaml` no longer lists the dead packages and now uses `apps/*`/`packages/*` globs. - `tasks/os-auth-spurious-logout-refresh.md` — commit ad6da76 "Fix 5-min logout, deploy-time JWKS, and stream append skeleton flash (#1410)" (merged 2026-06-10) shipped exactly this work. - `tasks/os-codemode-router.md` — task file was added in the very PR that implemented it (commit 98ee148, #1294). - `tasks/os-domain-capability-orpc-refactor-design.md` — every major pillar of the design (domains layout, capabilities, oRPC structure) exists on main. - `tasks/os-domain-capability-orpc-refactor-prd.md` — shipped in PR #1305 "Make codemode function calls event-driven" (squash commit 284193e, merged 2026-05-08). - `tasks/semaphore-lease-renewal.md` — the described lease-renewal feature exists on main as `resources.renew` (named "renew" rather than the proposed "extend") in `apps/semaphore`. - `tasks/signup-slug-uniqueness.md` — shipped with the auth worker (PR #1273); `packages/shared/src/slug.ts` implements `resolveUniqueSlug`/`slugifyWithSuffix`. - `apps/os/tasks/codemode-session-night-plan.md` — planned outcomes verifiably shipped on main, in evolved form (codemode session browser UI and follow-ons). - `apps/os/tasks/codemode-session-vertical-slice.md` — all 11 ticked checklist items shipped via PRs #1294/#1305 and follow-ups. - `apps/os/tasks/refactor-lifecycle-init-params-as-structured-name.md` — every acceptance criterion implemented in the `with-lifecycle-hooks.ts` mixin on main. - `apps/os/tasks/repos-vertical-slice.md` — frontmatter already says `state: done` and the described slice verifiably exists on main. - `apps/os/tasks/slack-processor-unwind.md` — all target-shape items exist on main (`/integrations/slack` stream path; no `/integrations/slack/webhooks` references). ### Deleted: obsolete / nonsense - `tasks/github-oauth-use-repo-id.md` — all referenced code is gone: `linkExternalIdToGroups` / `repoId` / `repository.id` return zero hits repo-wide. - `tasks/ignoreme-email-security.md` — every code path the task targets was deleted with the legacy OS1 stack (commit 545854d, #1341). - `tasks/os-stream-runtime-big-refactors.md` — os2-era brainstorm list largely superseded or done differently; item 2 shipped via PR #1394. - `tasks/realtime-pusher-efficiency.md` — targets the legacy OS1 realtime pusher, which no longer exists. - `tasks/stream-processor-ergonomics.md` — targets the legacy hook-style processor API, replaced by the class-based StreamProcessor model. ### Deleted: historical logs - `apps/os/tasks/slack-google-auth-poc-implementation.md` — explicitly an "Implementation Log" (`state: done`), not actionable work; shipped in merged PR #1317. - `apps/os/tasks/stream-processor-class-design-notes.md` — design notes written alongside the class-based StreamProcessor migration, not a task. - `apps/os/tasks/workspace-codemode-implementation-log.md` — `state: done`, all 9 checkpoints ticked; the described work verifiably shipped on main. ### Kept but flagged for maintainer judgment - `tasks/cf-prd-orphaned-resources-cleanup.md`: Explicit not-in-scope follow-ups (preview account 376ef7ed cleanup, Doppler os-legacy-backup pruning) were never broken out into their own tasks; spin them out only if still wanted. - `tasks/codemode-capability-policy.md`: Still-unshipped, still-wanted design work, but duplicates `apps/os/tasks/codemode-capability-access-policy.md` and overlaps the active itx capability-system design notes — maintainer should consolidate into a single task. - `tasks/complete/2026-05-22-os-captun-worker-test-tunnel.md`: apps/os still depends on the unpublished pkg.pr.new/captun@14 build (the task's stated stopgap); a published captun/worker release would be a separate follow-up, not a reason to keep this file. - `tasks/dead-code-and-docs-cleanup-audit.md`: Residual from this audit: packages/iterate is still excluded from root build/typecheck/test (`--filter '!iterate'`); if that CI gap matters, open a fresh small task rather than keeping this stale inventory. - `tasks/doppler-shared-and-os-secrets-audit.md`: Audit still unrun and wanted, but needs a rewrite first: replace Clerk-key expectations with iterateAuth, point AppConfig refs at `apps/os/src/config.ts` (`app.ts` and `packages/shared/src/apps/config.ts` were deleted in PR #1411), and refresh the 2026-05-18 baseline. - `tasks/ignoreme-email-security.md`: If outbound email via Resend is ever reintroduced in the rebuilt apps/os, recipient allowlisting should be designed fresh against the itx/egress-secret-substitution layer, not this OS1-era plan. - `tasks/iterate-cli-distribution.md`: Live but ~90% of the file is OpenCode architecture research notes, not actionable steps; npm distribution already exists, so the remaining work (bun binary, brew, install script) should be restated as concrete tasks or the research trimmed. - `tasks/os-auth-spurious-logout-refresh.md`: PR #1410 left one open thread: a manual end-to-end "wait 5 minutes in prod" verification was never done, and the claims-staleness force-refresh was consciously skipped (≤30m propagation accepted) — file a new narrow task only if either still matters. - `tasks/os-deploy-time-jwks-fetch.md`: Code shipped in PR #1410; only remaining action is deleting `ITERATE_AUTH_JWKS` from Doppler os prd/preview (still present and shadowing the deploy-time fetch) — after that, delete this task. - `tasks/os-domain-capability-orpc-refactor-prd.md`: Sibling task `os-domain-capability-orpc-refactor-design.md` (its dependsOn target) is likely also completed and should be audited/deleted together. - `tasks/os-project-do-projection-reconciliation.md`: Scope item "rename IterateMcpServer to ProjectMcpServerConnection" is already done and could be ticked off; the rest is unshipped and still relevant. - `tasks/os-project-hostname-base-singular.md`: Scope file paths are stale post-PR #1411 (`app.ts`→`src/config.ts`, `sync-clerk-apps.ts`→`sync-auth-clients.ts`, `entry.workerd.ts` deleted, routing files moved to `src/ingress/`); task itself is still valid. - `tasks/os-project-route-authorization.md`: Still-wanted design work (referenced by live project-ingress-architecture task), but needs rewrite: Clerk OAuth and `ProjectMcpServerEntrypoint` references are dead — MCP moved off project ingress (410 stub) and auth is now apps/auth Principal-based. - `tasks/os-stream-runtime-big-refactors.md`: Only surviving idea: cosmetic no-compat rename of `events.iterate.com/...` event-type names (events app is deleted); re-file as a small standalone task if still wanted. - `apps/os/tasks/codemode-capability-access-policy.md`: Live work, but near-duplicates root-level `tasks/codemode-capability-policy.md` (same PR #1294); keep this copy and consolidate/delete the root one. - `apps/os/tasks/codemode-session-night-plan.md`: Open capability-scope questions from this plan live on in `codemode-capability-access-policy.md`; checkboxes are unticked but the work shipped via PRs #1294/#1305/#1402. - `apps/os/tasks/codemode-session-vertical-slice.md`: Last unchecked box (generalize self-callable bindings) shipped as the loopback-binding pattern used repo-wide; follow-on work lives in `codemode-session-night-plan.md`. - `apps/os/tasks/project-egress-and-secrets-architecture.md`: Design doc whose first vertical slice shipped (egress + secret substitution MVP); remaining secret-DO/policy/approval/OAuth design is still live but needs grooming: drop completed PoC sections, update Clerk-scope terminology, and reconcile with itx DECISIONS.md as the newer design-of-record for egress wiring. - `apps/os/tasks/project-egress-intercept-tunnel-latency.md`: Still-relevant latency work, but file refs are stale (`entry.workerd.ts` → `src/worker.ts`; vendored `apps/os/src/lib/captun` removed for the published captun package in #1361) and the benchmark numbers predate the #1411 worker rebuild — re-benchmark before picking an option. - `apps/os/tasks/project-ingress-architecture.md`: Live, actively-maintained ingress reference (edited today in #1416), but needs a refresh: Clerk auth sections, `Project.checkAccess`, and the streams-upstream proxy model are superseded (auth worker, principal claims, bundled project worker), and the 2026-05-05 status checklist is partly outdated. - `apps/os/tasks/stream-processor-class-migration-log.md`: Migration log (merged today via #1402, which links to it as the canonical rationale) — not an actionable task; contains unique I6-I8 forensics not in the PR body, consider moving to docs/ alongside `tasks/migration-notes/` rather than deleting. - `apps/os/tasks/stream-subscriber-delivery-refactor.md`: Core design shipped differently via the class-model cutover (#1401/#1402/#1394); only live remainder is migrating `codemode.streamEvents`, `StreamsCapability.stream()`, and project-mcp-server-connection off the OS-internal NDJSON shim in `new-stream-runtime.ts` — consider replacing this large draft with a small task for that. - `apps/os/tasks/workspace-codemode-implementation-log.md`: Done implementation log; only marginally unique note is the rationale that plain method objects (not class instances) cross DO RPC, which is now embodied in the shipped workspace DO code. - `apps/os/tasks/migration-notes/`: Historical migration logs (not tasks) committed with and cited by merged PR #1402 one day ago; contain unique per-domain decisions plus the legacy-subscriber gap behind the 2026-06-10 prd Slack outage — maintainer should relocate to docs/ or delete deliberately. 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Low Risk** > Documentation and task-file deletions only; no application runtime or API behavior changes in the diff. > > **Overview** > **Aligns OS documentation with the current worker, auth, routing, and itx reality**, and **removes a large set of completed or obsolete task files** from `apps/os/tasks/` and `tasks/`. > > The **README / AGENTS** and **`architecture-and-operations.md`** rewrites drop Clerk-era and deleted-entrypoint references (`src/app.ts`, `src/entry.workerd.ts`, `/org/:organizationSlug`) in favor of **`src/worker.ts`**, **Iterate Auth**, **project-scoped routes** (`/projects/...`, `/new-project`), **canonical MCP** (`APP_CONFIG_MCP__BASE_URL`, auth-worker OAuth), **itx** endpoints, and **`sync-auth-clients.ts`**. Deploy docs now distinguish ambient **`pnpm cf:deploy`** from production **`pnpm deploy`**. E2E docs point at **`pnpm e2e`** and **`pnpm e2e:itx`** instead of removed capnweb vitest configs. > > **Cap'n Web tombstones** in `iterate-context*.md` redirect readers to **itx** (`src/itx/`, `itx-spec.md`). Research notes get **historical** headers; **itx-spec** notes merged status on main and documents that **`connectItx` is one-shot** (no §6.3 reconnect loop). **itx README / `Stubify`** docs are corrected: typed caps use **`itx.cap("name") as Stubify<...>`**, not declaration merging. > > **CONTEXT.md** fixes the example that claimed org UI lived under `/org/...`. **headless-local-debugging** uses **`/new-project`**. > > **Task grooming** deletes many markdown tasks whose work is done, superseded (itx, auth worker), or OS1-dead — including codemode vertical-slice plans, domain oRPC refactor design, egress MVP, Slack processor unwind, and similar inventory items. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit a4f093f. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> <!-- CLOUDFLARE_PREVIEW --> ## Environment Config Lease <!-- CLOUDFLARE_PREVIEW_STATE --> <!-- { "apps": { "os": { "appDisplayName": "OS", "appSlug": "os", "status": "deployed", "updatedAt": "2026-06-10T12:23:34.040Z", "headSha": "a4f093f29684fc65b851dbf53847ccd85ddf8ffc", "message": null, "publicUrl": "https://os.iterate-preview-5.com", "runUrl": "https://github.com/iterate/iterate/actions/runs/27275677688", "shortSha": "a4f093f" } }, "environmentConfigLease": { "dopplerConfig": "preview_5", "leasedUntil": 1781097591555, "leaseId": "36e57584-6cc7-4024-a027-103a3cb0b29b", "slug": "preview-5", "type": "environment-config-lease" } } --> <!-- /CLOUDFLARE_PREVIEW_STATE --> Lease: `preview-5` Doppler config: `preview_5` Type: `environment-config-lease` Leased until: 2026-06-10T13:19:51.555Z ### OS Status: deployed Commit: `a4f093f` Preview: https://os.iterate-preview-5.com [Workflow run](https://github.com/iterate/iterate/actions/runs/27275677688) Updated: 2026-06-10T12:23:34.040Z <!-- /CLOUDFLARE_PREVIEW --> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
jonastemplestein
added a commit
that referenced
this pull request
Jun 10, 2026
new-stream-runtime.ts was "new" relative to a runtime deleted in #1402, and toLegacyEvent produced the Event type that is simply THE OS event shape (the wire event plus streamPath). Names now say what things do: - new-stream-runtime.ts -> stream-runtime.ts - toLegacyEvent -> withStreamPath - toNewEventInput -> toStreamEventInput - toNewAfterOffset/toNewBeforeOffset -> toAfterOffset/toBeforeOffset - `Event as StreamLegacyEvent` aliases -> `Event as StreamEvent` (the alias itself stays: `Event` collides with the workers/DOM global) - NewStreamEvent/NewStreamEventInput aliases dropped inside the runtime No behavior change; pure rename, all suites green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
jonastemplestein
added a commit
that referenced
this pull request
Jun 10, 2026
…1481) ## What was broken Slack agents in prd receive messages but never reply — no LLM request is ever made. Observed live on 2026-06-10 in `iterate` project stream `/agents/slack/c08r1smtzgd/ts-1781124999-011519`: - the slack-agent processor rendered the webhook into a triggering `agent/input-added` at **offset 9** (20:56:46.2) - the agent processor's `subscription-configured` event landed at **offset 15** (20:56:46.7) — the AGENT DO wake hook appends it after D1 reads and workspace setup, so slack-agent reliably wins this race on a cold thread - the host anchors side effects at the subscription-configured offset (`stream-processor-host.ts`), so the input at offset 9 was reduced as historical replay and its scheduling side effect was skipped: no `llm-request-scheduled`, no `llm-request-requested`, no `openai-ws` activity, no reply — and nothing ever retriggers it - visible fingerprint in the stream: capability-noted renders exist only for offsets above the anchor (18–23), none for 8–9 The anchor mechanism is correct for re-attach (don't re-fire historical LLM requests), but it shipped in #1402 without anything making the *first* message of a new thread durable. Regression from #1402, same symptom as #1372 but a different mechanism. **Every first message of every new prod Slack thread is dropped.** ## The fix Make the trigger a durable obligation in reduced state instead of a fire-and-forget side effect: - **`AgentState.pendingTriggerOffset`** — set by a triggering `input-added`, cleared by `llm-request-scheduled` / `llm-request-requested` / `llm-request-queued`. If it survives in reduced state, the scheduling side effect never ran. - **`subscriber-connected` reconciliation recovers it** (the presence fact always lands above the anchor, so this handler always runs live): schedule a request when idle, append the queued fact when a request is in flight (never interrupts in-flight work). Appends are keyed off the trigger event exactly like the live path (`agent/llm-request-scheduled@<offset>`), so raced duplicates dedup in the stream. - **Gated on `pendingTriggerOffset <= sideEffectsAfterOffset`** so recovery fires only for anchor-skipped triggers and never races the live `input-added` handler. Crash/restart cases above the anchor remain owned by the existing scheduled-phase reconciliation. - `StreamProcessor.processEvent` args now expose `sideEffectsAfterOffset` (the batch-level hook already had it); the core processor's inline path passes 0 (inline appends are always live). The scheduled phase needs no queued fact on recovery: its handoff rebuilds the request body from full committed history, which already includes the skipped trigger. ## Verification - Unit tests replay the prod stream shape: trigger below anchor + subscriber-connected above → exactly one `llm-request-scheduled@9`; non-triggering inputs don't recover; in-flight requests get a queued fact; live triggers aren't double-scheduled. - New token-gated e2e (`schedules and completes an LLM request for a plain routed Slack message`) drives a real Slack root message + routed webhook through webhook → input → scheduled → requested → completed(success) against a live deployment. - `pnpm typecheck && pnpm lint && pnpm format && pnpm test` all green. - E2E run against the preview deployment with the real Slack bot token: results to follow in a comment. 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **High Risk** > Changes core agent LLM scheduling and subscriber-connected reconciliation on a production outage path; incorrect gating could double-schedule or miss triggers on every new Slack thread. > > **Overview** > Fixes **first-message silence** on new Slack thread streams when a triggering `input-added` lands **before** the agent subscription is configured: the host’s side-effect anchor replays that input into state but skips scheduling, so no LLM turn ever starts. > > **Agent processor** now records **`pendingTriggerOffset`** in reduced state for triggering inputs and clears it when a durable schedule/request/queue fact exists. On **`subscriber-connected`**, when that offset is at or below the anchor, it **recovers** the missed obligation—`llm-request-scheduled` when idle (same idempotency key as the live path) or **`llm-request-queued`** when a request is already in flight—without double-scheduling live triggers above the anchor. **`#appendLlmRequestScheduled`** arms the debounce timer with the **committed** `requestId` after idempotent dedup so raced recovery paths don’t wedge the handoff. > > **Streams**: `processEvent` receives **`sideEffectsAfterOffset`** so reconcilers can detect anchor-skipped side effects; the core inline path passes **`0`** (always live). > > **Verification**: new unit coverage for anchor-skip recovery, deduped schedule, queue-when-busy, and no recovery for non-triggering inputs; token-gated e2e asserts routed Slack webhook → scheduled → requested → completed(success). > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit eb3a7ac. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> <!-- CLOUDFLARE_PREVIEW --> ## Environment Config Lease <!-- CLOUDFLARE_PREVIEW_STATE --> <!-- { "apps": { "os": { "appDisplayName": "OS", "appSlug": "os", "status": "deployed", "updatedAt": "2026-06-10T21:58:59.268Z", "headSha": "eb3a7ac0bb17f468c1d5490f0b6951bfe612374e", "message": null, "publicUrl": "https://os.iterate-preview-4.com", "runUrl": "https://github.com/iterate/iterate/actions/runs/27308869802", "shortSha": "eb3a7ac" } }, "environmentConfigLease": { "dopplerConfig": "preview_4", "leasedUntil": 1781132168766, "leaseId": "29fbdda0-4a62-44f1-8b9f-ebe4adac552c", "slug": "preview-4", "type": "environment-config-lease" } } --> <!-- /CLOUDFLARE_PREVIEW_STATE --> Lease: `preview-4` Doppler config: `preview_4` Type: `environment-config-lease` Leased until: 2026-06-10T22:56:08.766Z ### OS Status: deployed Commit: `eb3a7ac` Preview: https://os.iterate-preview-4.com [Workflow run](https://github.com/iterate/iterate/actions/runs/27308869802) Updated: 2026-06-10T21:58:59.268Z <!-- /CLOUDFLARE_PREVIEW --> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Complete rewrite of the original spike: every stream processor apps/os hosts now runs on the class-based
StreamProcessormodel (#1401), owned directly by its domain Durable Object, and the Stream DO reaches subscribers through thepackages/sharedCallable abstraction. All legacy processor-model code is deleted (−17k lines net). Rebased onto main including the itx capabilities work (#1407).Full design rationale and issue log:
apps/os/tasks/stream-processor-class-migration-log.md(D1–D12, I1–I6 + deletion inventory), plus per-domain notes underapps/os/tasks/migration-notes/.Hosting model
A Durable Object hosts processors as plain class fields:
createStreamProcessorHostprovides checkpoint storage in DO KV (keyed by processor name), a late-bound stream context, per-subscription side-effect anchoring, andprocessor-registeredannouncements. Any number of named processors per DO.Subscriber delivery via Callable
stream/subscription-configuredpayloads carry{ type: "callable", callable: Callable }; the Stream DO dispatches the callable with the subscription handshake (dispatchCallable— same Workers RPC transport, so the live stream stub passes through). The hardcodedSTREAM_PROCESSOR_RUNNERdialing is gone;packages/streamsnow depends onpackages/shared. Legacybuilt-insubscriber shapes reduce harmlessly but are no longer dialed; OS re-appends callable subscriptions through the existing ensure-on-access paths with new idempotency keys. No subscriber authorization yet (explicitly out of scope).Contract-driven event filtering
subscribe/subscribeOutboundaccepteventTypes; the pump filters post-read while its cursor advances past non-matching events. Hosts always passcontract.consumes— the contract is the filter ("*"= unfiltered). Catch-up helpers wait on consumed-event targets instead of the raw stream head.Side-effect anchor
StreamProcessorgainssideEffectsAfterOffset: events at or below the anchor (persisted at first subscription handshake) reduce into state but skip side effects — attaching to an existing stream rebuilds state without re-firing historical effects (e.g. old LLM requests). Replaces the legacy dual-cursor + first-attach lookback machinery.Migrated processors (all wire-format-identical)
AgentDurableObjectProjectDurableObjectRepoDurableObjectSlackIntegrationDurableObject/SlackAgentDurableObjectCodemodeSessionopenai-ws keeps its socket as processor instance state (the DO is the connection scope).
scheduling,jsonata-transformer,dynamic-workerhad no live subscription path and were deleted, not migrated.LLM requests are background work (D12)
The LLM providers do NOT hold the processor's batch queue while a request is in flight. Executing under
blockProcessorWhilewould mean a cancellation or superseding event physically cannot be reduced until the request it should affect has finished — defeating the staleness check (isAgentLlmRequestStillCurrent) both providers run before appending agent-visible output. Instead:agent/llm-request-requestedexecutes viarunInBackground(keep-alive-backed through the host'sctx.waitUntil), so subsequent events keep flowing while requests run; stale requests complete silently.startedentry in reduced state with no in-flight execution marks a request a previous incarnation abandoned, and the next delivered batch re-executes it from stream history — still guarded by the staleness check.Deleted
The legacy OS
StreamProcessorRunnerDO (+ binding; alchemy computesdeleted_classesautomatically),packages/shared/src/stream-processors/**(~47 files), the shared DO mixins, and the legacy runner model inpackages/streams(processor.ts,processor-runner.ts,standard-processor-behavior.ts, ~14 legacy-only helpers). ~35 importers repointed.Issues found and fixed during validation & review
blockConcurrencyWhile; with processors co-hosted on the same DO this deadlocks against the input gate (the handshake/delivery it waits for queues behind it). Rule recorded in the log: never await processor catch-up inside a lifecycle gate on a co-hosting DO.started(skip onlycompleted), with regression tests; coldAgentDurableObjectinstances initialize their lifecycle before accepting subscription handshakes;agent-hostwakes its agent once per incarnation on any delivered event instead of an anchor-skipped historicalstream/created.Validation
apps/os: typecheck 0 errors, 167 unit tests; workerd suitestest:project-ingress6/6 andtest:codemode-session18/18 (cover the callable handshake, routed streams, and itx capabilities end-to-end)packages/streams(58),packages/shared(64),packages/ui, example-app: typecheck + tests greenpnpm lint/pnpm formatcleanos.iterate-preview-4.com): full OS e2e suite against the deployment — 27 passed / 1 todo / 0 failures, including real OpenAI conversations on freshly created projects, Cloudflare AI Gateway, codemode script execution, and routed Slack webhook → bang-command replies against the real Slack API🤖 Generated with Claude Code
Environment Config Lease
No active environment config lease.
OS
Status: released
Commit:
4d30dbdPreview: https://os.iterate-preview-2.com
Summary: Preview app released.
Workflow run
Updated: 2026-06-10T06:00:35.288Z
Semaphore
Status: released
Commit:
4d30dbdPreview: https://semaphore.iterate-preview-2.com
Summary: Preview app released.
Workflow run
Updated: 2026-06-10T06:00:31.085Z