client: Reconnect cloud websocket after it disconnects#57078
Conversation
94bfae6 to
bf3cc7b
Compare
bf3cc7b to
fb10ee9
Compare
|
I aligned the retry implementation with the collab client retry implementation, but I'm wondering if we wouldn't benefit from something different. We don't care about reconnecting asap as much, but we do care about reconnecting after the user has been offline for a few hours, and I don't think this implementation works well in that case. |
The cloud websocket was established once during sign-in and never re-established. On any server restart or transient network drop the connection task exited. yawc itself does not reconnect. This wraps `connect_to_cloud` in a long-lived task that re-establishes the websocket with exponential backoff and jitter, reusing `INITIAL_RECONNECTION_DELAY` and `MAX_RECONNECTION_DELAY` so the behavior matches the Collab reconnect loop in the same module. Part of CLO-713. Release Notes: - N/A
6ee7111 to
7606704
Compare
I looked into this again and my concern was misplaced. We'll try reconnecting indefinitely. |
Until now, the cloud-hosted model list was only refreshed in response to events that exercise the LLM token (a `UserUpdated` push, an organization change, or `PrivateUserInfoUpdated`). If a user wasn't actively using AI features around the time we shipped new models, the list could stay stale until they restarted Zed. This is the second step toward fixing that, after #57078 made the cloud websocket reconnect on its own. We now treat each successful (re)connect as a hint that the server state may have changed, so possibly new model definitions will be available, and trigger a model list refresh. The trigger is a new `Client::cloud_connection_id()` watch that bumps a counter each time the websocket handshake completes. `CloudLanguageModelProvider::State` subscribes to it and, on every tick after the initial `0`, schedules a debounced refresh. The debounce is trailing-edge with a 5-minute window plus up to 5 minutes of uniform jitter, so a burst of reconnects (rolling deploy, flaky network) coalesces into a single refresh once things have been quiet, and we avoid thundering herd issues from many clients reconnecting at the same time. Closes CLO-713. Release Notes: - The list of Zed managed models is now refreshed automatically, without requiring a restart
Until now, the cloud-hosted model list was only refreshed in response to events that exercise the LLM token (a `UserUpdated` push, an organization change, or `PrivateUserInfoUpdated`). If a user wasn't actively using AI features around the time we shipped new models, the list could stay stale until they restarted Zed. This is the second step toward fixing that, after #57078 made the cloud websocket reconnect on its own. We now treat each successful (re)connect as a hint that the server state may have changed, so possibly new model definitions will be available, and trigger a model list refresh. The trigger is a new `Client::cloud_connection_id()` watch that bumps a counter each time the websocket handshake completes. `CloudLanguageModelProvider::State` subscribes to it and, on every tick after the initial `0`, schedules a debounced refresh. The debounce is trailing-edge with a 5-minute window plus up to 5 minutes of uniform jitter, so a burst of reconnects (rolling deploy, flaky network) coalesces into a single refresh once things have been quiet, and we avoid thundering herd issues from many clients reconnecting at the same time. Closes CLO-713. Release Notes: - The list of Zed managed models is now refreshed automatically, without requiring a restart
Until now, the cloud-hosted model list was only refreshed in response to events that exercise the LLM token (a `UserUpdated` push, an organization change, or `PrivateUserInfoUpdated`). If a user wasn't actively using AI features around the time we shipped new models, the list could stay stale until they restarted Zed. This is the second step toward fixing that, after #57078 made the cloud websocket reconnect on its own. We now treat each successful (re)connect as a hint that the server state may have changed, so possibly new model definitions will be available, and trigger a model list refresh. The trigger is a new `Client::cloud_connection_id()` watch that bumps a counter each time the websocket handshake completes. `CloudLanguageModelProvider::State` subscribes to it and, on every tick after the initial `0`, schedules a debounced refresh. The debounce is trailing-edge with a 5-minute window plus up to 5 minutes of uniform jitter, so a burst of reconnects (rolling deploy, flaky network) coalesces into a single refresh once things have been quiet, and we avoid thundering herd issues from many clients reconnecting at the same time. Closes CLO-713. Release Notes: - The list of Zed managed models is now refreshed automatically, without requiring a restart
Until now, the cloud-hosted model list was only refreshed in response to events that exercise the LLM token (a `UserUpdated` push, an organization change, or `PrivateUserInfoUpdated`). If a user wasn't actively using AI features around the time we shipped new models, the list could stay stale until they restarted Zed. This is the second step toward fixing that, after #57078 made the cloud websocket reconnect on its own. We now treat each successful (re)connect as a hint that the server state may have changed, so possibly new model definitions will be available, and trigger a model list refresh. The trigger is a new `Client::cloud_connection_id()` watch that bumps a counter each time the websocket handshake completes. `CloudLanguageModelProvider::State` subscribes to it and, on every tick after the initial `0`, schedules a debounced refresh (with jitter, so we don't have all active clients trying to reconnect at the same time after we deploy in cloud). Closes CLO-713. Release Notes: - The list of Zed hosted models is now refreshed automatically, without requiring a restart
…s#57078) The cloud websocket was established once during sign-in and never re-established. On any server restart or transient network drop the connection task exited. yawc itself does not reconnect. This wraps `connect_to_cloud` in a long-lived task that re-establishes the websocket with exponential backoff and jitter, reusing `INITIAL_RECONNECTION_DELAY` and `MAX_RECONNECTION_DELAY` so the behavior matches the Collab reconnect loop in the same module. Part of CLO-713. Release Notes: - N/A
…7528) Until now, the cloud-hosted model list was only refreshed in response to events that exercise the LLM token (a `UserUpdated` push, an organization change, or `PrivateUserInfoUpdated`). If a user wasn't actively using AI features around the time we shipped new models, the list could stay stale until they restarted Zed. This is the second step toward fixing that, after zed-industries#57078 made the cloud websocket reconnect on its own. We now treat each successful (re)connect as a hint that the server state may have changed, so possibly new model definitions will be available, and trigger a model list refresh. The trigger is a new `Client::cloud_connection_id()` watch that bumps a counter each time the websocket handshake completes. `CloudLanguageModelProvider::State` subscribes to it and, on every tick after the initial `0`, schedules a debounced refresh (with jitter, so we don't have all active clients trying to reconnect at the same time after we deploy in cloud). Closes CLO-713. Release Notes: - The list of Zed hosted models is now refreshed automatically, without requiring a restart
The cloud websocket was established once during sign-in and never re-established. On any server restart or transient network drop the connection task exited. yawc itself does not reconnect.
This wraps
connect_to_cloudin a long-lived task that re-establishes the websocket with exponential backoff and jitter, reusingINITIAL_RECONNECTION_DELAYandMAX_RECONNECTION_DELAYso the behavior matches the Collab reconnect loop in the same module.Part of CLO-713.
Release Notes: