Fix Slack agents never responding (regression from #1370)#1372
Merged
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 24d0074. Configure here.
24d0074 to
31f5867
Compare
Slack-routed agent streams never registered the LLM processors (agent-chat/agent/provider), so a user message landed as an agent/input-added event that nothing consumed — the agent never replied. Regression from #1370 (streams runtime cutover): the routed bootstrap used to subscribe a callable to AgentDurableObject.afterAppend (which woke the agent and registered its processors via onInstanceWake); the cutover replaced it with a built-in agent-host processor that never wakes the agent for its own stream. What actually starts a processor is the subscription-configured event, which was never appended for the LLM processors on routed streams. Fix: - agent-host now wakes the AgentDurableObject for its own stream on stream/created (ensureAgentRunnerForOwnStream); onInstanceWake registers agent-chat/agent/LLM + setup events, whose subscription-configured events Stream#reconcileOutboundConnections then dials into runners. Verified the runner replays from offset 0, so agent-host reliably sees stream/created (always offset 1). - Align the bootstrap agent-host subscription key with the canonical AgentDurableObject key so the two declarations dedupe to a single runner. - Use the new-runtime event prefix (events.iterate.com/stream/) in the OS call-sites that compare against new-runtime core events: the child-stream-created check (was the legacy /core/ prefix, which the new runtime never emits), the project agents-root jsonata matcher, and the stream-composer UI examples. Scope note: a broader /core/ -> /stream/ rename across the shared package and the separate events.iterate.com app was reverted — it broke the events runtime e2e (append 500s) and is unrelated to this bug. Tracked as a follow-up. Verified: full-repo typecheck, oxlint, oxfmt, affected OS unit tests pass. Not yet deployed/round-tripped on a preview. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
31f5867 to
2ba51d8
Compare
jonastemplestein
added a commit
that referenced
this pull request
Jun 10, 2026
…1481) ## What was broken Slack agents in prd receive messages but never reply — no LLM request is ever made. Observed live on 2026-06-10 in `iterate` project stream `/agents/slack/c08r1smtzgd/ts-1781124999-011519`: - the slack-agent processor rendered the webhook into a triggering `agent/input-added` at **offset 9** (20:56:46.2) - the agent processor's `subscription-configured` event landed at **offset 15** (20:56:46.7) — the AGENT DO wake hook appends it after D1 reads and workspace setup, so slack-agent reliably wins this race on a cold thread - the host anchors side effects at the subscription-configured offset (`stream-processor-host.ts`), so the input at offset 9 was reduced as historical replay and its scheduling side effect was skipped: no `llm-request-scheduled`, no `llm-request-requested`, no `openai-ws` activity, no reply — and nothing ever retriggers it - visible fingerprint in the stream: capability-noted renders exist only for offsets above the anchor (18–23), none for 8–9 The anchor mechanism is correct for re-attach (don't re-fire historical LLM requests), but it shipped in #1402 without anything making the *first* message of a new thread durable. Regression from #1402, same symptom as #1372 but a different mechanism. **Every first message of every new prod Slack thread is dropped.** ## The fix Make the trigger a durable obligation in reduced state instead of a fire-and-forget side effect: - **`AgentState.pendingTriggerOffset`** — set by a triggering `input-added`, cleared by `llm-request-scheduled` / `llm-request-requested` / `llm-request-queued`. If it survives in reduced state, the scheduling side effect never ran. - **`subscriber-connected` reconciliation recovers it** (the presence fact always lands above the anchor, so this handler always runs live): schedule a request when idle, append the queued fact when a request is in flight (never interrupts in-flight work). Appends are keyed off the trigger event exactly like the live path (`agent/llm-request-scheduled@<offset>`), so raced duplicates dedup in the stream. - **Gated on `pendingTriggerOffset <= sideEffectsAfterOffset`** so recovery fires only for anchor-skipped triggers and never races the live `input-added` handler. Crash/restart cases above the anchor remain owned by the existing scheduled-phase reconciliation. - `StreamProcessor.processEvent` args now expose `sideEffectsAfterOffset` (the batch-level hook already had it); the core processor's inline path passes 0 (inline appends are always live). The scheduled phase needs no queued fact on recovery: its handoff rebuilds the request body from full committed history, which already includes the skipped trigger. ## Verification - Unit tests replay the prod stream shape: trigger below anchor + subscriber-connected above → exactly one `llm-request-scheduled@9`; non-triggering inputs don't recover; in-flight requests get a queued fact; live triggers aren't double-scheduled. - New token-gated e2e (`schedules and completes an LLM request for a plain routed Slack message`) drives a real Slack root message + routed webhook through webhook → input → scheduled → requested → completed(success) against a live deployment. - `pnpm typecheck && pnpm lint && pnpm format && pnpm test` all green. - E2E run against the preview deployment with the real Slack bot token: results to follow in a comment. 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **High Risk** > Changes core agent LLM scheduling and subscriber-connected reconciliation on a production outage path; incorrect gating could double-schedule or miss triggers on every new Slack thread. > > **Overview** > Fixes **first-message silence** on new Slack thread streams when a triggering `input-added` lands **before** the agent subscription is configured: the host’s side-effect anchor replays that input into state but skips scheduling, so no LLM turn ever starts. > > **Agent processor** now records **`pendingTriggerOffset`** in reduced state for triggering inputs and clears it when a durable schedule/request/queue fact exists. On **`subscriber-connected`**, when that offset is at or below the anchor, it **recovers** the missed obligation—`llm-request-scheduled` when idle (same idempotency key as the live path) or **`llm-request-queued`** when a request is already in flight—without double-scheduling live triggers above the anchor. **`#appendLlmRequestScheduled`** arms the debounce timer with the **committed** `requestId` after idempotent dedup so raced recovery paths don’t wedge the handoff. > > **Streams**: `processEvent` receives **`sideEffectsAfterOffset`** so reconcilers can detect anchor-skipped side effects; the core inline path passes **`0`** (always live). > > **Verification**: new unit coverage for anchor-skip recovery, deduped schedule, queue-when-busy, and no recovery for non-triggering inputs; token-gated e2e asserts routed Slack webhook → scheduled → requested → completed(success). > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit eb3a7ac. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> <!-- CLOUDFLARE_PREVIEW --> ## Environment Config Lease <!-- CLOUDFLARE_PREVIEW_STATE --> <!-- { "apps": { "os": { "appDisplayName": "OS", "appSlug": "os", "status": "deployed", "updatedAt": "2026-06-10T21:58:59.268Z", "headSha": "eb3a7ac0bb17f468c1d5490f0b6951bfe612374e", "message": null, "publicUrl": "https://os.iterate-preview-4.com", "runUrl": "https://github.com/iterate/iterate/actions/runs/27308869802", "shortSha": "eb3a7ac" } }, "environmentConfigLease": { "dopplerConfig": "preview_4", "leasedUntil": 1781132168766, "leaseId": "29fbdda0-4a62-44f1-8b9f-ebe4adac552c", "slug": "preview-4", "type": "environment-config-lease" } } --> <!-- /CLOUDFLARE_PREVIEW_STATE --> Lease: `preview-4` Doppler config: `preview_4` Type: `environment-config-lease` Leased until: 2026-06-10T22:56:08.766Z ### OS Status: deployed Commit: `eb3a7ac` Preview: https://os.iterate-preview-4.com [Workflow run](https://github.com/iterate/iterate/actions/runs/27308869802) Updated: 2026-06-10T21:58:59.268Z <!-- /CLOUDFLARE_PREVIEW --> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

What was broken
Slack agents stopped responding. A user
@mentionbecomes anagent/input-addedevent on the routed agent stream, but nothing consumes it — the LLM processors (agent-chat/agent/ the provider processor) were never registered on Slack-routed streams. Observed live ontemplestein2stream/agents/slack/c09trdv61v4/ts-1780670924-517029:slack-agentran (produced the input), but noagent/LLM processor existed, so no reply.Root cause — regression from #1370 (streams runtime cutover)
routedStreamBootstrapEventssubscribed a callable toAGENTDOafterAppend. Invoking it wokeAgentDurableObject→onInstanceWakeregisteredagent-chat/agent/LLM/agent-host+ seeded setup events.agent-hostprocessor whoseafterAppendonly runsensureChildAgentRunner(+ codemode handlers). It never wakes the agent for its own stream, so the LLM processors are never registered.subscription-configuredevent (Stream#reconcileOutboundConnectionsdials a runner per subscription key). Those were never appended for the LLM processors on routed streams.ensureChildAgentRunnercompared against the legacyevents.iterate.com/core/child-stream-created, but the new runtime emitsevents.iterate.com/stream/child-stream-created.Dashboard-created agents were unaffected because
new.tsxexplicitly subscribes the full processor set. PR #1371 would not have fixed this — it only addsstream-processor-registeredmarker events (which don't start processors) and only to the UI flow.The fix
ensureAgentRunnerForOwnStream): whenagent-hostruns on a routed agent stream, on thestream/createdevent it initializes that stream'sAgentDurableObject→onInstanceWakeregisters the LLM processors and setup events; the resultingsubscription-configuredevents are whatreconcileOutboundConnectionsdials. Verified the runner replays from offset 0 (replayAfterOffset: snapshot?.offset ?? 0), soagent-hostreliably seesstream/created(always offset 1).agent-hostsubscription key with the canonicalAgentDurableObjectkey (runner DOs are keyed by${namespace}:${path}:${subscriptionKey}), so the two declarations resolve to one runner.events.iterate.com/stream/prefix at the OS call-sites that compare against new-runtime core events: the brokenchild-stream-createdcheck (local constants), the project agents-root jsonata matcher, and the stream-composer UI examples.Verification
pnpm typecheck(18 projects),oxlint,oxfmt@mentionbefore prod (Preview / e2eis otherwise skipped on PRs).Scope note
An earlier revision also did a broad
events.iterate.com/core/→/stream/rename across the shared package and the separate events.iterate.com app. That broke the events runtime e2e (append → 500) and is unrelated to this bug, so it was reverted out of this PR. If we still want the events platform on the/stream/prefix it needs its own investigation — tracked as a follow-up.Follow-ups (found while auditing #1370)
ProjectDurableObject.afterAppendis orphaned: the old runtime forwarded lifecycle events to the project config-workerafterAppendhook; the new built-inproject-lifecycleprocessor has noafterAppendand nothing calls it.CodemodeSession.afterAppendis orphaned (likely benign — resolves locally viaappendAndConsume).agents.e2e.test.tshas vacuous/core/error-occurredassertions; worth real Slack-path e2e coverage so this can't regress silently.🤖 Generated with Claude Code
Environment Config Lease
No active environment config lease.
OS
Status: released
Commit:
2ba51d8Preview: https://os.iterate-preview-2.com
Summary: Preview app released.
Workflow run
Updated: 2026-06-05T19:28:38.611Z
Note
Medium Risk
Touches agent stream bootstrap and durable-object initialization on every routed agent stream; wrong event matching or wake ordering could affect non-Slack agents, but changes are narrowly scoped to host processor and Slack routing.
Overview
Restores Slack-routed agent streams after the streams runtime cutover (#1370): routed streams only got
slack-agent+agent-host, soagent/input-addedevents were never consumed because LLM processors were never registered.ensureAgentRunnerForOwnStreaminitializes the stream’sAgentDurableObjectonevents.iterate.com/stream/created(viaagent-hostafterAppend, usingkeepAliveto avoid deadlocking catch-up). That runsonInstanceWake, which appends the LLM processor subscriptions and setup events.Slack bootstrap now uses
agentProcessorSubscriptionConfiguredEventso the routedagent-hostsubscription key matchesAgentDurableObject, deduping to a single runner.Event type alignment for the new runtime:
child-stream-createdand related core lifecycle types useevents.iterate.com/stream/…instead of legacy…/core/…in agent host logic, the agents-root jsonata matcher, and stream composer presets.Reviewed by Cursor Bugbot for commit 2ba51d8. Bugbot is set up for automated code reviews on this repo. Configure here.