Skip to content

🐛 fix: first inject the cloudecc runtime session should use the existingStatus#14592

Merged
ONLY-yours merged 4 commits into
canaryfrom
fix/cloudeccRuntime
May 9, 2026
Merged

🐛 fix: first inject the cloudecc runtime session should use the existingStatus#14592
ONLY-yours merged 4 commits into
canaryfrom
fix/cloudeccRuntime

Conversation

@ONLY-yours

Copy link
Copy Markdown
Member

💻 Change Type

  • ✨ feat
  • 🐛 fix
  • ♻️ refactor
  • 💄 style
  • 👷 build
  • ⚡️ perf
  • ✅ test
  • 📝 docs
  • 🔨 chore

🔗 Related Issue

🔀 Description of Change

🧪 How to Test

  • Tested locally
  • Added/updated tests
  • No tests needed

📸 Screenshots / Videos

Before After
... ...

📝 Additional Information

ONLY-yours and others added 2 commits May 9, 2026 22:29
…ction

Race condition on new-topic first message:
1. switchTopic loads runningOperation → useGatewayReconnect fires
2. executeGatewayAgent calls connectToGateway (status: connecting)
3. reconnectToGatewayOperation overwrites with resumeOnConnect:true
4. Gateway sees resume on a brand-new session → no events → stuck

Second message works because the client store's runningOperation is
stale (from the first op), so SWR deduplications and no reconnect fires.

Fix: bail out of reconnectToGatewayOperation if gatewayConnections
already shows connecting/connected for that operationId.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sume

CC stores session files at ~/.claude/projects/<encoded-cwd>/.
Without an explicit --cwd the actual working directory can differ
between sandbox invocations, so --resume <heteroSessionId> fails
to locate the previous session files even though the container is
persistent and the ID is correctly stored in topic.metadata.

Default cwd to /workspace for cloud runs (desktop keeps its own
explicit path), guaranteeing a stable session-file location across
page reloads within the same sandbox lifecycle.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @ONLY-yours, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

@vercel

vercel Bot commented May 9, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
lobehub Ready Ready Preview, Comment May 9, 2026 3:30pm

Request Review

@dosubot dosubot Bot added the size:S This PR changes 10-29 lines, ignoring generated files. label May 9, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 644d1b6788

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/store/chat/slices/aiChat/actions/gateway.ts Outdated
@codecov

codecov Bot commented May 9, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 84.61538% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.91%. Comparing base (746bf4f) to head (8ef7e7e).
⚠️ Report is 3 commits behind head on canary.

Additional details and impacted files
@@             Coverage Diff             @@
##           canary   #14592       +/-   ##
===========================================
- Coverage   81.78%   68.91%   -12.87%     
===========================================
  Files         671     2637     +1966     
  Lines       44839   232546   +187707     
  Branches     6632    29627    +22995     
===========================================
+ Hits        36670   160257   +123587     
- Misses       8019    72139    +64120     
  Partials      150      150               
Flag Coverage Δ
app 63.60% <84.61%> (?)
database 91.98% <ø> (?)
packages/agent-runtime 80.48% <ø> (ø)
packages/builtin-tool-lobe-agent 83.41% <ø> (ø)
packages/context-engine 84.00% <ø> (ø)
packages/conversation-flow 92.43% <ø> (ø)
packages/file-loaders 87.60% <ø> (ø)
packages/memory-user-memory 74.74% <ø> (ø)
packages/model-bank 99.94% <ø> (ø)
packages/model-runtime 83.72% <ø> (ø)
packages/prompts 70.16% <ø> (ø)
packages/python-interpreter 92.90% <ø> (ø)
packages/ssrf-safe-fetch 0.00% <ø> (ø)
packages/types 4.86% <ø> (ø)
packages/utils 88.02% <ø> (ø)
packages/web-crawler 88.29% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Store 66.98% <0.00%> (∅)
Services 54.16% <ø> (∅)
Server 70.62% <100.00%> (∅)
Libs 55.04% <ø> (∅)
Utils 82.51% <ø> (-10.97%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

The previous guard only skipped reconnect for 'connecting'/'connected'
but the connection can already be in 'authenticating' or 'reconnecting'
by the time useGatewayReconnect fires, leaving the race window open.

Flip the condition: skip for any status that is not 'disconnected'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Vercel serverless functions are stateless per-request, so `operationStates`
is empty on every `heteroIngest` call. loadOrCreateState always cold-creates.

#14539 fixed `toolMsgIdByCallId` restoration but left `accumulatedContent`,
`toolState.payloads`, and `toolState.persistedIds` empty on cold load,
causing two bugs:

- Content truncation: cold instance starts with `accumulatedContent=''`,
  accumulates only the current batch's text, then writes that shorter string
  on the next step boundary or terminal — overwriting the longer content the
  previous write had already stored in DB.

- Tool duplication / tools[] overwrite: `persistedIds={}` on cold load
  means every `tools_calling` event re-creates already-persisted tool
  messages, and `payloads=[]` means phase 1/3 writes only the current
  batch's tools, wiping previous tools from `assistant.tools[]`.

Fix: in `loadOrCreateState`, fetch the current assistant message and restore
`accumulatedContent`, `accumulatedReasoning`, `toolState.payloads`, and
`toolState.persistedIds` from it. Cold load is now equivalent to warm load.

Also adds two regression tests covering the cold-replica scenarios.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels May 9, 2026
@ONLY-yours ONLY-yours merged commit f35e2d8 into canary May 9, 2026
50 checks passed
@ONLY-yours ONLY-yours deleted the fix/cloudeccRuntime branch May 9, 2026 15:44
@ONLY-yours ONLY-yours restored the fix/cloudeccRuntime branch May 9, 2026 16:38
@arvinxx arvinxx mentioned this pull request May 12, 2026
Innei pushed a commit to Innei/lobehub that referenced this pull request May 12, 2026
…ingStatus (lobehub#14592)

* 🐛 fix: skip reconnect when gateway action already established a connection

Race condition on new-topic first message:
1. switchTopic loads runningOperation → useGatewayReconnect fires
2. executeGatewayAgent calls connectToGateway (status: connecting)
3. reconnectToGatewayOperation overwrites with resumeOnConnect:true
4. Gateway sees resume on a brand-new session → no events → stuck

Second message works because the client store's runningOperation is
stale (from the first op), so SWR deduplications and no reconnect fires.

Fix: bail out of reconnectToGatewayOperation if gatewayConnections
already shows connecting/connected for that operationId.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix: always pass --cwd /workspace for cloud CC to ensure session resume

CC stores session files at ~/.claude/projects/<encoded-cwd>/.
Without an explicit --cwd the actual working directory can differ
between sandbox invocations, so --resume <heteroSessionId> fails
to locate the previous session files even though the container is
persistent and the ID is correctly stored in topic.metadata.

Default cwd to /workspace for cloud runs (desktop keeps its own
explicit path), guaranteeing a stable session-file location across
page reloads within the same sandbox lifecycle.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix: extend reconnect guard to cover all in-flight connection statuses

The previous guard only skipped reconnect for 'connecting'/'connected'
but the connection can already be in 'authenticating' or 'reconnecting'
by the time useGatewayReconnect fires, leaving the race window open.

Flip the condition: skip for any status that is not 'disconnected'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix: restore cold replica state in HeterogeneousPersistenceHandler

Vercel serverless functions are stateless per-request, so `operationStates`
is empty on every `heteroIngest` call. loadOrCreateState always cold-creates.

lobehub#14539 fixed `toolMsgIdByCallId` restoration but left `accumulatedContent`,
`toolState.payloads`, and `toolState.persistedIds` empty on cold load,
causing two bugs:

- Content truncation: cold instance starts with `accumulatedContent=''`,
  accumulates only the current batch's text, then writes that shorter string
  on the next step boundary or terminal — overwriting the longer content the
  previous write had already stored in DB.

- Tool duplication / tools[] overwrite: `persistedIds={}` on cold load
  means every `tools_calling` event re-creates already-persisted tool
  messages, and `payloads=[]` means phase 1/3 writes only the current
  batch's tools, wiping previous tools from `assistant.tools[]`.

Fix: in `loadOrCreateState`, fetch the current assistant message and restore
`accumulatedContent`, `accumulatedReasoning`, `toolState.payloads`, and
`toolState.persistedIds` from it. Cold load is now equivalent to warm load.

Also adds two regression tests covering the cold-replica scenarios.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
arvinxx pushed a commit that referenced this pull request May 12, 2026
…ingStatus (#14592)

* 🐛 fix: skip reconnect when gateway action already established a connection

Race condition on new-topic first message:
1. switchTopic loads runningOperation → useGatewayReconnect fires
2. executeGatewayAgent calls connectToGateway (status: connecting)
3. reconnectToGatewayOperation overwrites with resumeOnConnect:true
4. Gateway sees resume on a brand-new session → no events → stuck

Second message works because the client store's runningOperation is
stale (from the first op), so SWR deduplications and no reconnect fires.

Fix: bail out of reconnectToGatewayOperation if gatewayConnections
already shows connecting/connected for that operationId.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix: always pass --cwd /workspace for cloud CC to ensure session resume

CC stores session files at ~/.claude/projects/<encoded-cwd>/.
Without an explicit --cwd the actual working directory can differ
between sandbox invocations, so --resume <heteroSessionId> fails
to locate the previous session files even though the container is
persistent and the ID is correctly stored in topic.metadata.

Default cwd to /workspace for cloud runs (desktop keeps its own
explicit path), guaranteeing a stable session-file location across
page reloads within the same sandbox lifecycle.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix: extend reconnect guard to cover all in-flight connection statuses

The previous guard only skipped reconnect for 'connecting'/'connected'
but the connection can already be in 'authenticating' or 'reconnecting'
by the time useGatewayReconnect fires, leaving the race window open.

Flip the condition: skip for any status that is not 'disconnected'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix: restore cold replica state in HeterogeneousPersistenceHandler

Vercel serverless functions are stateless per-request, so `operationStates`
is empty on every `heteroIngest` call. loadOrCreateState always cold-creates.

#14539 fixed `toolMsgIdByCallId` restoration but left `accumulatedContent`,
`toolState.payloads`, and `toolState.persistedIds` empty on cold load,
causing two bugs:

- Content truncation: cold instance starts with `accumulatedContent=''`,
  accumulates only the current batch's text, then writes that shorter string
  on the next step boundary or terminal — overwriting the longer content the
  previous write had already stored in DB.

- Tool duplication / tools[] overwrite: `persistedIds={}` on cold load
  means every `tools_calling` event re-creates already-persisted tool
  messages, and `payloads=[]` means phase 1/3 writes only the current
  batch's tools, wiping previous tools from `assistant.tools[]`.

Fix: in `loadOrCreateState`, fetch the current assistant message and restore
`accumulatedContent`, `accumulatedReasoning`, `toolState.payloads`, and
`toolState.persistedIds` from it. Cold load is now equivalent to warm load.

Also adds two regression tests covering the cold-replica scenarios.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
lezi-fun pushed a commit to lezi-fun/lobehub that referenced this pull request May 13, 2026
…ingStatus (lobehub#14592)

* 🐛 fix: skip reconnect when gateway action already established a connection

Race condition on new-topic first message:
1. switchTopic loads runningOperation → useGatewayReconnect fires
2. executeGatewayAgent calls connectToGateway (status: connecting)
3. reconnectToGatewayOperation overwrites with resumeOnConnect:true
4. Gateway sees resume on a brand-new session → no events → stuck

Second message works because the client store's runningOperation is
stale (from the first op), so SWR deduplications and no reconnect fires.

Fix: bail out of reconnectToGatewayOperation if gatewayConnections
already shows connecting/connected for that operationId.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix: always pass --cwd /workspace for cloud CC to ensure session resume

CC stores session files at ~/.claude/projects/<encoded-cwd>/.
Without an explicit --cwd the actual working directory can differ
between sandbox invocations, so --resume <heteroSessionId> fails
to locate the previous session files even though the container is
persistent and the ID is correctly stored in topic.metadata.

Default cwd to /workspace for cloud runs (desktop keeps its own
explicit path), guaranteeing a stable session-file location across
page reloads within the same sandbox lifecycle.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix: extend reconnect guard to cover all in-flight connection statuses

The previous guard only skipped reconnect for 'connecting'/'connected'
but the connection can already be in 'authenticating' or 'reconnecting'
by the time useGatewayReconnect fires, leaving the race window open.

Flip the condition: skip for any status that is not 'disconnected'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix: restore cold replica state in HeterogeneousPersistenceHandler

Vercel serverless functions are stateless per-request, so `operationStates`
is empty on every `heteroIngest` call. loadOrCreateState always cold-creates.

lobehub#14539 fixed `toolMsgIdByCallId` restoration but left `accumulatedContent`,
`toolState.payloads`, and `toolState.persistedIds` empty on cold load,
causing two bugs:

- Content truncation: cold instance starts with `accumulatedContent=''`,
  accumulates only the current batch's text, then writes that shorter string
  on the next step boundary or terminal — overwriting the longer content the
  previous write had already stored in DB.

- Tool duplication / tools[] overwrite: `persistedIds={}` on cold load
  means every `tools_calling` event re-creates already-persisted tool
  messages, and `payloads=[]` means phase 1/3 writes only the current
  batch's tools, wiping previous tools from `assistant.tools[]`.

Fix: in `loadOrCreateState`, fetch the current assistant message and restore
`accumulatedContent`, `accumulatedReasoning`, `toolState.payloads`, and
`toolState.persistedIds` from it. Cold load is now equivalent to warm load.

Also adds two regression tests covering the cold-replica scenarios.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant