Skip to content

fix(desktop): add retry logic with exponential backoff to health check system#24138

Closed
herjarsa wants to merge 1 commit into
anomalyco:devfrom
herjarsa:fix/health-check-retry
Closed

fix(desktop): add retry logic with exponential backoff to health check system#24138
herjarsa wants to merge 1 commit into
anomalyco:devfrom
herjarsa:fix/health-check-retry

Conversation

@herjarsa

@herjarsa herjarsa commented Apr 24, 2026

Copy link
Copy Markdown

Fixes #24142

Add retry logic with exponential backoff to the desktop app health check system to improve stability during server startup.

Why

The local server sidecar sometimes fails health checks during startup, especially on slower systems or when the server is still initializing. This causes IDE stability issues (MCP local connections disconnect/reconnect, IDE freezes when switching sessions).

What

  • check_health_with_retry() retries up to 6x with exponential backoff
  • Backoff timing: 500ms→1s→2s→4s→4s (total delay 11.5s + 12s timeouts = ~23.5s max)
  • 2s timeout per attempt to fail fast and retry
  • match on every request outcome -- network errors/timeouts retry, no short-circuit with ?
  • Reuse reqwest::Client connection pool (built once before loop)

o_proxy() for localhost to avoid proxy interference

  • Total max ~23.5s, fits inside caller's 30s timeout

…k system

- Add check_health_with_retry() with up to 6 attempts
- Implement backoff_interval() helper (500ms -> 1s -> 2s -> 4s -> 4s)
- Add 2s timeout per attempt with tokio::time::timeout
- Build reqwest::Client once before loop (connection pool reuse)
- Retry on transient errors (network/timeout) via match instead of short-circuiting with ?
- Apply no_proxy() to skip proxy for localhost health checks
- Keep max total duration ~23.5s within caller's 30s timeout
- Remove dead unreachable code and unused variables
@herjarsa herjarsa requested a review from adamdotdevin as a code owner April 24, 2026 10:01
@github-actions github-actions Bot added the needs:compliance This means the issue will auto-close after 2 hours. label Apr 24, 2026
@github-actions

Copy link
Copy Markdown
Contributor

This PR doesn't fully meet our contributing guidelines and PR template.

What needs to be fixed:

  • PR description is missing required template sections. Please use the PR template.

Please edit this PR description to address the above within 2 hours, or it will be automatically closed.

If you believe this was flagged incorrectly, please let a maintainer know.

@github-actions

Copy link
Copy Markdown
Contributor

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

  1. Open an issue describing the bug/feature (if one doesn't exist)
  2. Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

@github-actions

Copy link
Copy Markdown
Contributor

The following comment was made by an LLM, it may be inaccurate:

Based on my search, here are the potentially related PRs (excluding the current PR #24138):

Related PRs Found

  1. PR fix(desktop): pass credentials when checking default server health on startup #16125 - fix(desktop): pass credentials when checking default server health on startup

    • This PR also addresses desktop health check issues on startup. It may be related since both deal with server health checks during initialization.
  2. PR fix(server): exempt /global/health from auth middleware #12867 - fix(server): exempt /global/health from auth middleware

    • This is a server-side health check fix that could complement the retry logic being added in the current PR.

These appear to be adjacent improvements to the health check system rather than direct duplicates. PR #24138 adds robustness through retry logic, while the others address specific health check issues (credentials and auth middleware).

@github-actions

Copy link
Copy Markdown
Contributor

This pull request has been automatically closed because it was not updated to meet our contributing guidelines within the 2-hour window.

Feel free to open a new pull request that follows our guidelines.

@github-actions github-actions Bot removed the needs:compliance This means the issue will auto-close after 2 hours. label Apr 24, 2026
@github-actions github-actions Bot closed this Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(desktop): health check fails during server startup causing IDE instability

1 participant