Skip to content

client: Reconnect cloud websocket after it disconnects#57078

Merged
tomhoule merged 1 commit into
mainfrom
tomhoule-koluynypqqqp
May 21, 2026
Merged

client: Reconnect cloud websocket after it disconnects#57078
tomhoule merged 1 commit into
mainfrom
tomhoule-koluynypqqqp

Conversation

@tomhoule

Copy link
Copy Markdown
Contributor

The cloud websocket was established once during sign-in and never re-established. On any server restart or transient network drop the connection task exited. yawc itself does not reconnect.

This wraps connect_to_cloud in a long-lived task that re-establishes the websocket with exponential backoff and jitter, reusing INITIAL_RECONNECTION_DELAY and MAX_RECONNECTION_DELAY so the behavior matches the Collab reconnect loop in the same module.

Part of CLO-713.

Release Notes:

  • N/A

@cla-bot cla-bot Bot added the cla-signed The user has signed the Contributor License Agreement label May 18, 2026
@zed-community-bot zed-community-bot Bot added the staff Pull requests authored by a current member of Zed staff label May 18, 2026
@tomhoule tomhoule force-pushed the tomhoule-koluynypqqqp branch from 94bfae6 to bf3cc7b Compare May 18, 2026 17:07
@tomhoule tomhoule changed the title client: Reconnect to the cloud websocket after it drops client: Reconnect cloud websocket after it disconnects May 18, 2026
@tomhoule tomhoule marked this pull request as ready for review May 18, 2026 17:32
@tomhoule tomhoule requested a review from maxdeviant May 18, 2026 17:32
@tomhoule tomhoule force-pushed the tomhoule-koluynypqqqp branch from bf3cc7b to fb10ee9 Compare May 20, 2026 10:21
@tomhoule

Copy link
Copy Markdown
Contributor Author

I aligned the retry implementation with the collab client retry implementation, but I'm wondering if we wouldn't benefit from something different. We don't care about reconnecting asap as much, but we do care about reconnecting after the user has been offline for a few hours, and I don't think this implementation works well in that case.

Comment thread crates/client/src/client.rs Outdated
The cloud websocket was established once during sign-in and never
re-established. On any server restart or transient network drop the
connection task exited. yawc itself does not reconnect.

This wraps `connect_to_cloud` in a long-lived task that re-establishes
the websocket with exponential backoff and jitter, reusing
`INITIAL_RECONNECTION_DELAY` and `MAX_RECONNECTION_DELAY` so the behavior
matches the Collab reconnect loop in the same module.

Part of CLO-713.

Release Notes:

- N/A
@tomhoule tomhoule force-pushed the tomhoule-koluynypqqqp branch from 6ee7111 to 7606704 Compare May 21, 2026 13:38
@tomhoule

Copy link
Copy Markdown
Contributor Author

I aligned the retry implementation with the collab client retry implementation, but I'm wondering if we wouldn't benefit from something different. We don't care about reconnecting asap as much, but we do care about reconnecting after the user has been offline for a few hours, and I don't think this implementation works well in that case.

I looked into this again and my concern was misplaced. We'll try reconnecting indefinitely.

@tomhoule tomhoule added this pull request to the merge queue May 21, 2026
Merged via the queue into main with commit 8d28ca5 May 21, 2026
32 checks passed
@tomhoule tomhoule deleted the tomhoule-koluynypqqqp branch May 21, 2026 15:24
tomhoule added a commit that referenced this pull request May 22, 2026
Until now, the cloud-hosted model list was only refreshed in response to events that exercise the LLM token (a `UserUpdated` push, an organization change, or `PrivateUserInfoUpdated`). If a user wasn't actively using AI features around the time we shipped new models, the list could stay stale until they restarted Zed.

This is the second step toward fixing that, after #57078 made the cloud websocket reconnect on its own. We now treat each successful (re)connect as a hint that the server state may have changed, so possibly new model definitions will be available, and trigger a model list refresh.

The trigger is a new `Client::cloud_connection_id()` watch that bumps a counter each time the websocket handshake completes. `CloudLanguageModelProvider::State` subscribes to it and, on every tick after the initial `0`, schedules a debounced refresh. The debounce is trailing-edge with a 5-minute window plus up to 5 minutes of uniform jitter, so a burst of reconnects (rolling deploy, flaky network) coalesces into a single refresh once things have been quiet, and we avoid thundering herd issues from many clients reconnecting at the same time.

Closes CLO-713.

Release Notes:

- The list of Zed managed models is now refreshed automatically, without requiring a restart
tomhoule added a commit that referenced this pull request May 26, 2026
Until now, the cloud-hosted model list was only refreshed in response to events that exercise the LLM token (a `UserUpdated` push, an organization change, or `PrivateUserInfoUpdated`). If a user wasn't actively using AI features around the time we shipped new models, the list could stay stale until they restarted Zed.

This is the second step toward fixing that, after #57078 made the cloud websocket reconnect on its own. We now treat each successful (re)connect as a hint that the server state may have changed, so possibly new model definitions will be available, and trigger a model list refresh.

The trigger is a new `Client::cloud_connection_id()` watch that bumps a counter each time the websocket handshake completes. `CloudLanguageModelProvider::State` subscribes to it and, on every tick after the initial `0`, schedules a debounced refresh. The debounce is trailing-edge with a 5-minute window plus up to 5 minutes of uniform jitter, so a burst of reconnects (rolling deploy, flaky network) coalesces into a single refresh once things have been quiet, and we avoid thundering herd issues from many clients reconnecting at the same time.

Closes CLO-713.

Release Notes:

- The list of Zed managed models is now refreshed automatically, without requiring a restart
tomhoule added a commit that referenced this pull request May 26, 2026
Until now, the cloud-hosted model list was only refreshed in response to events that exercise the LLM token (a `UserUpdated` push, an organization change, or `PrivateUserInfoUpdated`). If a user wasn't actively using AI features around the time we shipped new models, the list could stay stale until they restarted Zed.

This is the second step toward fixing that, after #57078 made the cloud websocket reconnect on its own. We now treat each successful (re)connect as a hint that the server state may have changed, so possibly new model definitions will be available, and trigger a model list refresh.

The trigger is a new `Client::cloud_connection_id()` watch that bumps a counter each time the websocket handshake completes. `CloudLanguageModelProvider::State` subscribes to it and, on every tick after the initial `0`, schedules a debounced refresh. The debounce is trailing-edge with a 5-minute window plus up to 5 minutes of uniform jitter, so a burst of reconnects (rolling deploy, flaky network) coalesces into a single refresh once things have been quiet, and we avoid thundering herd issues from many clients reconnecting at the same time.

Closes CLO-713.

Release Notes:

- The list of Zed managed models is now refreshed automatically, without requiring a restart
github-merge-queue Bot pushed a commit that referenced this pull request May 27, 2026
Until now, the cloud-hosted model list was only refreshed in response to
events that exercise the LLM token (a `UserUpdated` push, an
organization change, or `PrivateUserInfoUpdated`). If a user wasn't
actively using AI features around the time we shipped new models, the
list could stay stale until they restarted Zed.

This is the second step toward fixing that, after #57078 made the cloud
websocket reconnect on its own. We now treat each successful (re)connect
as a hint that the server state may have changed, so possibly new model
definitions will be available, and trigger a model list refresh.

The trigger is a new `Client::cloud_connection_id()` watch that bumps a
counter each time the websocket handshake completes.
`CloudLanguageModelProvider::State` subscribes to it and, on every tick
after the initial `0`, schedules a debounced refresh (with jitter, so we
don't have all active clients trying to reconnect at the same time after
we deploy in cloud).

Closes CLO-713.

Release Notes:

- The list of Zed hosted models is now refreshed automatically, without
requiring a restart
TomPlanche pushed a commit to TomPlanche/zed that referenced this pull request Jun 2, 2026
…s#57078)

The cloud websocket was established once during sign-in and never
re-established. On any server restart or transient network drop the
connection task exited. yawc itself does not reconnect.

This wraps `connect_to_cloud` in a long-lived task that re-establishes
the websocket with exponential backoff and jitter, reusing
`INITIAL_RECONNECTION_DELAY` and `MAX_RECONNECTION_DELAY` so the
behavior matches the Collab reconnect loop in the same module.

Part of CLO-713.

Release Notes:

- N/A
TomPlanche pushed a commit to TomPlanche/zed that referenced this pull request Jun 2, 2026
…7528)

Until now, the cloud-hosted model list was only refreshed in response to
events that exercise the LLM token (a `UserUpdated` push, an
organization change, or `PrivateUserInfoUpdated`). If a user wasn't
actively using AI features around the time we shipped new models, the
list could stay stale until they restarted Zed.

This is the second step toward fixing that, after zed-industries#57078 made the cloud
websocket reconnect on its own. We now treat each successful (re)connect
as a hint that the server state may have changed, so possibly new model
definitions will be available, and trigger a model list refresh.

The trigger is a new `Client::cloud_connection_id()` watch that bumps a
counter each time the websocket handshake completes.
`CloudLanguageModelProvider::State` subscribes to it and, on every tick
after the initial `0`, schedules a debounced refresh (with jitter, so we
don't have all active clients trying to reconnect at the same time after
we deploy in cloud).

Closes CLO-713.

Release Notes:

- The list of Zed hosted models is now refreshed automatically, without
requiring a restart
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed The user has signed the Contributor License Agreement staff Pull requests authored by a current member of Zed staff

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants