🐛 fix(agent-runtime): preserve streamed content across mid-stream cancel#15173
Conversation
LOBE-9523 Mid-stream STOP currently collapses the in-memory streamed assistant content back to the LOADING_FLAT placeholder (cLen 5182 → 3 observed in the agent-gateway probe dump at `.agent-gateway/caseD-prerefresh-…json`), and a subsequent reload returns the same placeholder from DB so the content is **permanently lost**. Root cause (matrix-tested via Electron + probe, see updated LOBE-9523 description): when the user clicks STOP, `interruptOperation` flips state.status to 'interrupted' and `coordinator.saveAgentState` publishes `agent_runtime_end` carrying the `uiMessages` snapshot. The executor's post-stream finalize at `RuntimeExecutors.call_llm:1078` hasn't run yet, so the assistant row is still the empty placeholder — that placeholder gets pushed to the client as SoT and clobbers the streamed content. Three coordinated fixes: 1. **Executor partial-finalize on interrupt** (`RuntimeExecutors.ts` inner catch). When `isOperationInterrupted` is true AND the `onText`/`onThinking`/`onToolsCalling` callbacks accumulated partial content, do an extra `messageModel.update` before rethrowing. This makes the DB row carry the real partial content, so a later reload shows the streamed answer instead of an empty placeholder. 2. **Coordinator skips uiMessages on interrupted** (`AgentRuntimeCoordinator.ts` `resolveUiMessages`). Short-circuit when `state.status === 'interrupted'` so the agent_runtime_end payload omits `uiMessages` entirely. The executor's partial-finalize update from (1) is racy with this publish path — leaving the field undefined lets the client preserve its in-memory state instead of pulling whatever's in DB at publish time. 3. **Client skips DB refetch on `reason='interrupted'`** (`gatewayEventHandler.ts` agent_runtime_end case). The existing fallback at L540 does a `fetchAndReplaceMessages` whenever uiMessages is absent, which would defeat fix (2) by reading the still-pre-finalize DB row. Add a third branch: when reason='interrupted' AND no uiMessages, keep the in-memory state — the next explicit refresh (route change, user-driven mutate, page reload) will pick up the finalized partial content from (1). Test matrix (5 new tests): - `RuntimeExecutors`: persists on interrupt-with-content / skips on empty-interrupt / skips on non-interrupt error - `AgentRuntimeCoordinator`: resolver not called on saveAgentState / saveStepResult when status='interrupted' - `gatewayEventHandler`: no refetch + no replaceMessages when reason= 'interrupted' and uiMessages absent / SoT still consumed when server did include uiMessages on an interrupted run (forward-compat) Manual verification (probe dumps in `.agent-gateway/`): - Case A/B/C/E (clean stream, mid-stream tab-switch, post-stream tab-switch, post-stream reload) all remain ✅ — no regression - Case D (long stream → STOP) currently shows `cLen[gRojDUMG] 5182→3 near-event:[agent_runtime_end]` rollback; with this patch the client retains 5182 chars and the DB carries the same partial content for reload Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c4cc3b4d30
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| action: 'gateway/agent_runtime_end', | ||
| context, | ||
| }); | ||
| } else if (data?.reason === 'interrupted') { |
There was a problem hiding this comment.
Refetch when interrupted before any stream snapshot exists
Skipping the DB fallback for every reason === 'interrupted' leaves optimistic local messages unreconciled when the operation is cancelled before step_start/stream_start delivered usable state. In that path there is no streamed content to preserve, so the store can retain tmp LOADING_FLAT messages (or stale message IDs) indefinitely until a manual refresh. Previously the fallback refetch removed those optimistic placeholders from store state.
Useful? React with 👍 / 👎.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## canary #15173 +/- ##
===========================================
- Coverage 89.70% 70.91% -18.80%
===========================================
Files 854 3148 +2294
Lines 102573 313487 +210914
Branches 9083 34153 +25070
===========================================
+ Hits 92017 222308 +130291
- Misses 10390 91013 +80623
Partials 166 166
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Reviewer caught a regression in PR #15173's agent_runtime_end change: unconditionally skipping the DB fallback when `reason === 'interrupted'` leaves the optimistic `tmp_*` placeholder messages stuck in the store when cancel arrives BEFORE any server state landed (no step_start, no stream_start with server id, no chunks). Previously the fallback `fetchAndReplaceMessages` cleaned those up by replacing them with the server-side rows. Track `hasStreamedContent` in the handler closure and flip it to true on: - `stream_start` switching to a server-assigned assistant id - `stream_chunk` dispatching text / reasoning / tools_calling Gate the interrupted-skip on this flag: - `hasStreamedContent === true` → keep in-memory state (mid-stream cancel) - `hasStreamedContent === false` → fall back to refetch (cancel-before-stream) New test for the cancel-before-stream path; existing "NOT refetch when reason=interrupted" test renamed and updated to set up prior stream activity before sending the cancel.
…cel (lobehub#15173) * 🐛 fix(agent-runtime): preserve streamed content across mid-stream cancel LOBE-9523 Mid-stream STOP currently collapses the in-memory streamed assistant content back to the LOADING_FLAT placeholder (cLen 5182 → 3 observed in the agent-gateway probe dump at `.agent-gateway/caseD-prerefresh-…json`), and a subsequent reload returns the same placeholder from DB so the content is **permanently lost**. Root cause (matrix-tested via Electron + probe, see updated LOBE-9523 description): when the user clicks STOP, `interruptOperation` flips state.status to 'interrupted' and `coordinator.saveAgentState` publishes `agent_runtime_end` carrying the `uiMessages` snapshot. The executor's post-stream finalize at `RuntimeExecutors.call_llm:1078` hasn't run yet, so the assistant row is still the empty placeholder — that placeholder gets pushed to the client as SoT and clobbers the streamed content. Three coordinated fixes: 1. **Executor partial-finalize on interrupt** (`RuntimeExecutors.ts` inner catch). When `isOperationInterrupted` is true AND the `onText`/`onThinking`/`onToolsCalling` callbacks accumulated partial content, do an extra `messageModel.update` before rethrowing. This makes the DB row carry the real partial content, so a later reload shows the streamed answer instead of an empty placeholder. 2. **Coordinator skips uiMessages on interrupted** (`AgentRuntimeCoordinator.ts` `resolveUiMessages`). Short-circuit when `state.status === 'interrupted'` so the agent_runtime_end payload omits `uiMessages` entirely. The executor's partial-finalize update from (1) is racy with this publish path — leaving the field undefined lets the client preserve its in-memory state instead of pulling whatever's in DB at publish time. 3. **Client skips DB refetch on `reason='interrupted'`** (`gatewayEventHandler.ts` agent_runtime_end case). The existing fallback at L540 does a `fetchAndReplaceMessages` whenever uiMessages is absent, which would defeat fix (2) by reading the still-pre-finalize DB row. Add a third branch: when reason='interrupted' AND no uiMessages, keep the in-memory state — the next explicit refresh (route change, user-driven mutate, page reload) will pick up the finalized partial content from (1). Test matrix (5 new tests): - `RuntimeExecutors`: persists on interrupt-with-content / skips on empty-interrupt / skips on non-interrupt error - `AgentRuntimeCoordinator`: resolver not called on saveAgentState / saveStepResult when status='interrupted' - `gatewayEventHandler`: no refetch + no replaceMessages when reason= 'interrupted' and uiMessages absent / SoT still consumed when server did include uiMessages on an interrupted run (forward-compat) Manual verification (probe dumps in `.agent-gateway/`): - Case A/B/C/E (clean stream, mid-stream tab-switch, post-stream tab-switch, post-stream reload) all remain ✅ — no regression - Case D (long stream → STOP) currently shows `cLen[gRojDUMG] 5182→3 near-event:[agent_runtime_end]` rollback; with this patch the client retains 5182 chars and the DB carries the same partial content for reload Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * 🐛 fix(chat-store): only skip interrupt refetch after stream progressed Reviewer caught a regression in PR lobehub#15173's agent_runtime_end change: unconditionally skipping the DB fallback when `reason === 'interrupted'` leaves the optimistic `tmp_*` placeholder messages stuck in the store when cancel arrives BEFORE any server state landed (no step_start, no stream_start with server id, no chunks). Previously the fallback `fetchAndReplaceMessages` cleaned those up by replacing them with the server-side rows. Track `hasStreamedContent` in the handler closure and flip it to true on: - `stream_start` switching to a server-assigned assistant id - `stream_chunk` dispatching text / reasoning / tools_calling Gate the interrupted-skip on this flag: - `hasStreamedContent === true` → keep in-memory state (mid-stream cancel) - `hasStreamedContent === false` → fall back to refetch (cancel-before-stream) New test for the cancel-before-stream path; existing "NOT refetch when reason=interrupted" test renamed and updated to set up prior stream activity before sending the cancel. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
# 🚀 LobeHub Release (20260528) **Release Date:** May 28, 2026 **Since v2.2.0:** 220 merged PRs · 15 contributors > This cycle brings heterogeneous "platform agents" you can dispatch to local or remote devices, a rebuilt onboarding flow, document-centric chat, and a unified model-runtime error model — with new DeepSeek V4 and Gemini 3.5 Flash support along the way. --- ## ✨ Highlights - **More Hetero Agents (OpenClaw / Hermes)** — Create heterogeneous agents and dispatch them to local or remote devices through the device gateway, with an execution-target switcher in the composer and persistent CLI sessions. (#15065, #15179, #15022) - **iMessage on Desktop** — New iMessage setup and bridge on desktop, plus bot attachments across every platform. (#15228, #15227, #15029) - **Skills in the Composer** — Drag skill chips into chat, trigger installed skills from the slash menu mid-line, and surface project-level skills in the homogeneous agent runtime. (#15095, #15061, #15110) - **New Models** — DeepSeek V4 Flash/Pro and Gemini 3.5 Flash across providers, with thinking params for structured output and chat cost estimates. (#15031, #15001, #15051, #14876) - **Agent Runtime Observability** — OpenTelemetry GenAI semantic conventions plus per-call generation tracing. (#15123, #15124) --- ## 🤖 Agents & Heterogeneous Runtime - **Platform agent creation** — OpenClaw/Hermes creation UI, device guard, and remote dispatch backend. (#15065) - **Execution-target switcher** — Pick local vs remote execution directly in the composer; device-selection UX with actionable guidance. (#15179, #15111) - **CLI hetero dispatch** — OpenClaw/Hermes dispatch with persistent sessions and a notify protocol. (#15022) - **Gateway snapshot as source of truth** — Consume the gateway `uiMessages` snapshot at step boundaries to keep chat state consistent. (#15153, #15152) - **Client sub-agent as a normal tool call** — Simplifies the sub-agent execution path. (#15281) - **Hermes agent chain** — Implements the Hermes agent chain logic. (#15189) - **Device registry** — TRPC endpoints to register, list, update, and remove devices. (#15299) - **Desktop device routing** — Route gateway agent runs through `lh hetero exec`; restore `userId` in gateway dispatch and gate local-system by execution target. (#15132, #15232) - **Agent signals** — Anchor agent-signal receipts to messages and isolate memory-agent messages into a child thread. (#14969, #14921) --- ## 🚀 Onboarding - **Simplified first screen** — Defer topic creation to first send. (#15090) - **Market Agent Picker** — Added as a classic onboarding step, with template prefetch. (#14980, #15041) - **Welcome guidance** — Show agent welcome guidance on first run. (#15098) - **Mobile** — Adapt agent onboarding UI and restore Classic-step padding on mobile. (#15019, #15032) - **Discovery** — Streamline discovery to a single profession question. (#14987) - **Analytics** — Track onboarding step events and create-agent modal source. (#15133, #15028) --- ## 📄 Documents, Pages & Knowledge - **Thread chat in preview** — Embed thread chat in the document preview portal. (#15216) - **Non-markdown rendering** — Render non-markdown docs as a read-only highlight. (#15272) - **Multi-select** — Multi-select delete in the document tree. (#15125) - **Page-agent streaming** — Preview `initPage` streaming arguments. (#15039) - **Per-agent topics** — Per-agent topic management page. (#15207) - **Server-side category** — Derive document category server-side and drop frontend predicates. (#15076) --- ## 🧩 Skills & Tools - **Drag skill chips** — Drag skills into chat input and register agent-document skills. (#15095) - **Slash menu** — Installed skills appear in the slash menu with a mid-line trigger. (#15061) - **Project skills** — Recognize project-level skills in the homogeneous agent runtime and surface them regardless of active device. (#15110, #15177) - **VFS archiving** — Archive oversized tool results to VFS instead of truncating. (#15074) - **@localfile mentions** — Drag folders into chat input as `@localFile` mentions on desktop. (#15071) --- ## 🧠 Model Runtime & Providers - **Error spec registry** — Unify error codes into a spec + pattern registry, split `ProviderBizError` into finer codes, classify Cloud-only codes via a tier digit, and add `DatabasePersistError`. (#15262, #15286, #15278, #15279) - **New models** — DeepSeek V4 Flash/Pro (opencode-go) and Gemini 3.5 Flash; DeepSeek V4 Pro on SiliconCloud. (#15031, #15001, #15017, #15267) - **Structured output** — Thinking params for structured output, Bedrock structured generation, and DeepSeek `generateObject` tool choice. (#15051, #15174, #15054) - **Cost** — Chat cost estimate support; preserve usage cost in custom streams. (#14876, #15218) --- ## 💬 Chat & User Experience - **Follow-up chips** — Extend follow-up chip suggestions to general chat with scene-specific model config. (#15101, #14797) - **Input drafts** — Persist unsent input drafts across tab switches and prevent repeated draft restore. (#14992, #15024) - **Command menu** — Order topic/message search by recency and promote inline type filters. (#15094, #14986) - **Zoom HUD** — Show a zoom-level HUD on Cmd +/− and Cmd 0. (#15294) - **Copy** — Unescape markdown escapes when copying user messages. (#15253) --- ## 🖥️ Desktop - **App Nap fix** — Prevent App Nap from dropping the gateway WebSocket during display sleep. (#14994) - **File preview** — Preview `.cjs`/`.mjs`/no-extension files instead of binary fallback and expand `~` when opening local files. (#15168, #15284) - **Cross-platform settings** — Open settings via main-window navigation on Windows/Linux and restore the route after an update restart. (#15036, #14922) - **Token refresh** — Prevent frequent logout from token-refresh retries. (#14928) --- ## 📊 Observability - **OTel GenAI** — Instrument Agent Runtime with OpenTelemetry GenAI semantic conventions. (#15123) - **Generation tracing** — Per-call `llm_generation_tracing` with a pre-allocated tracingId and recordFeedback router. (#15124, #15146) - **Error classification** — Persist `ERROR_CODE_SPECS` classification on operation errors. (#15273) --- ## 🗃️ Database Migrations - **Batch migrations** — Topic usage stats, push tokens, `tasks.editor_data`, and document shares. (#15280) - **Tracing & eval tables** — Add `llm_generation_tracing` and agent eval experiment tables. (#15126) > Self-hosted operators should run the database migration (`pnpm db:migrate`, or restart with auto-migrate enabled) after upgrading. The changes are additive and backwards-compatible. --- ## 🔒 Security & Reliability - **Security:** Remove the `getPlaintextCred` tool to prevent plaintext credential exposure. (#14998) - **Security:** Prompt account selection for Google OAuth and add `prompt=consent` to the OIDC authorization URL to fix missing refresh tokens. (#15234, #15010) - **Reliability:** Preserve streamed content across a mid-stream cancel. (#15173) - **Reliability:** Bound the Redis command timeout and configure the Anthropic client timeout. (#15091, #15042) - **Reliability:** Prevent infinite recursion in the assistant chain. (#15288) --- ## 👥 Contributors Huge thanks to **15 contributors** who shipped **220 merged PRs** this cycle. @AnotiaWang · @sxjeru · @algojogacor · @hardy-one · @arvinxx · @Innei · @tjx666 · @lijian · @AmAzing129 · @rdmclin2 · @neko · @cy948 · @CanisMinor · @sudongyuer · @rivertwilight Plus @lobehubbot and renovate[bot] for maintenance. --- **Full Changelog**: v2.2.0...release/weekly-20260528
Summary
Mid-stream STOP currently collapses streamed assistant content back to the LOADING_FLAT placeholder (cLen 5182 → 3 observed in
.agent-gateway/caseD-prerefresh-…json), and a subsequent reload returns the same placeholder so the content is permanently lost. Linear: LOBE-9523.Three coordinated fixes:
RuntimeExecutors.tsinner catch) — whenisOperationInterruptedis true AND streaming callbacks accumulated content, do an extramessageModel.updatebefore rethrowing so DB carries the real partial content.uiMessageson interrupted (AgentRuntimeCoordinator.resolveUiMessages) — short-circuit whenstate.status === 'interrupted'soagent_runtime_enddoesn't push a pre-finalize snapshot that would clobber the client's in-memory streamed content.reason='interrupted'(gatewayEventHandler.ts agent_runtime_endcase) — instead of falling back tofetchAndReplaceMessageswhen uiMessages is absent + reason is interrupted, keep in-memory state. The next explicit refresh picks up the finalized partial content from (1).Test plan
5 new automated tests across 3 files (all passing locally; 173 / 768 broader sweep green):
RuntimeExecutors— persists on interrupt with content / skips on empty-interrupt / skips on non-interrupt errorAgentRuntimeCoordinator— resolver not called onsaveAgentState/saveStepResultwhenstatus='interrupted'gatewayEventHandler— no refetch + no replaceMessages onreason='interrupted'without uiMessages / SoT still consumed when server did include uiMessages (forward-compat)Manual verification matrix (agent-gateway probe in
.agent-gateway/):cLen↓N→3 (same id) near-event:[agent_runtime_end]no longer appears in dump'sROLLBACKSsection🤖 Generated with Claude Code