fix(desktop): add retry logic with exponential backoff to health check system by herjarsa · Pull Request #24162 · anomalyco/opencode

herjarsa · 2026-04-24T12:39:23Z

Issue for this PR

Type of change

Bug fix
New feature
Refactor / code improvement
Documentation

What does this PR do?

The desktop app health check (check_health) fires once immediately after spawning the local server sidecar. On slower machines or when the server is still initializing, this single attempt fails and marks the server as down. This causes cascading instability:

MCP local connections repeatedly disconnect/reconnect
IDE freezes when switching sessions
Sidecar marked as unhealthy even though it is still starting up

The fix: Replace the single-shot health check with check_health_with_retry() that retries up to 6 times with exponential backoff.

Retry strategy:

Backoff intervals: 500ms -> 1s -> 2s -> 4s -> 4s (capped)
2-second timeout per attempt via tokio::time::timeout
Network errors and timeouts return false and retry (no premature short-circuit with ?)
reqwest::Client built once before the loop to reuse the connection pool
Total max duration ~23.5s, safely within the caller's 30s timeout

How did you verify your code works?

Rust syntax validated: removed unused variables, duplicate .no_proxy() calls, and dead unreachable code
Exponential backoff math verified: 500+1000+2000+4000+4000 = 11.5s delays + 12s timeouts / 6 attempts
Logic reviewed critically confirming all retry paths are correct
Note: cargo check could not run locally due to missing MSVC linker on this Windows CI environment, but code is structurally sound

Screenshots / recordings

N/A -- backend health check logic, no UI changes.

Checklist

I have tested my changes locally
I have not included unrelated changes in this PR

…k system - Add check_health_with_retry() with up to 6 attempts - Implement backoff_interval() helper (500ms -> 1s -> 2s -> 4s -> 4s) - Add 2s timeout per attempt with tokio::time::timeout - Build reqwest::Client once before loop (connection pool reuse) - Retry on transient errors (network/timeout) via match instead of short-circuiting with ? - Apply no_proxy() to skip proxy for localhost health checks - Keep max total duration ~23.5s within caller's 30s timeout - Remove dead unreachable code and unused variables

github-actions · 2026-04-24T12:44:07Z

Thanks for updating your PR! It now meets our contributing guidelines. 👍

herjarsa · 2026-04-24T13:38:57Z

Also closes #23997 — The local server sidecar health check failure cascades into MCP connection drops. When the app marks the sidecar as unhealthy, it tears down dependent MCP connections (Supabase, Desktop Commander, n8n, Universal Brain), while MCPs with their own persistent transport (Telegram, MetaTrader) survive. Fixing health check retry re-connects that circuit before the MCPs are torn down.

…ness + thread-store harness - anomalyco/opencode#24179: session-scoped permission bridge for external providers (merge-after-nits) - anomalyco/opencode#24162: desktop health-check exponential backoff (request-changes; loopback gate dropped, no_proxy widened) - openai/codex#19389: npm update prompt readiness gate (merge-after-nits) - openai/codex#19266: non-local thread-store regression harness (merge-after-nits)

github-actions · 2026-05-24T22:22:21Z

Automated PR Cleanup

Thank you for contributing to opencode.

Due to the high volume of PRs from users and AI agents, we periodically close older PRs using automated criteria so maintainers can focus review time on the most active and community-supported contributions.

This PR was closed because it matched the following cleanup criteria:

The PR was created more than 1 month ago
The PR had fewer than 2 positive reactions
Positive reactions are counted as thumbs-up, heart, celebration, or rocket reactions on the PR

PRs created within the last month are not affected by this cleanup.

If you believe this PR was closed incorrectly, or if you are still actively working on it, please leave a comment explaining why it should be reopened. A maintainer can review and reopen it if appropriate.

Thanks again for taking the time to contribute.

herjarsa requested a review from adamdotdevin as a code owner April 24, 2026 12:39

github-actions Bot added needs:compliance This means the issue will auto-close after 2 hours. and removed needs:compliance This means the issue will auto-close after 2 hours. labels Apr 24, 2026

github-actions Bot closed this May 24, 2026

github-actions Bot added the automated-pr-cleanup label May 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(desktop): add retry logic with exponential backoff to health check system#24162

fix(desktop): add retry logic with exponential backoff to health check system#24162
herjarsa wants to merge 1 commit into
anomalyco:devfrom
herjarsa:fix/health-check-retry

herjarsa commented Apr 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

herjarsa commented Apr 24, 2026

Uh oh!

github-actions Bot commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

herjarsa commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue for this PR

Type of change

What does this PR do?

How did you verify your code works?

Screenshots / recordings

Checklist

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

herjarsa commented Apr 24, 2026

Uh oh!

github-actions Bot commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

herjarsa commented Apr 24, 2026 •

edited

Loading