Skip to content

fix(loop): friendly DeepSeek 5xx error with reachability probe#440

Merged
esengine merged 2 commits into
mainfrom
fix/loop-deepseek-5xx-friendly
May 8, 2026
Merged

fix(loop): friendly DeepSeek 5xx error with reachability probe#440
esengine merged 2 commits into
mainfrom
fix/loop-deepseek-5xx-friendly

Conversation

@esengine

@esengine esengine commented May 8, 2026

Copy link
Copy Markdown
Owner

Why

Raw DeepSeek 503: <body> was bubbling straight to the UI on every DS-side outage. Users couldn't tell if Reasonix had crashed or DeepSeek's service was struggling — and DS 5xx is common in evening/Asia hours, so this hit real users (red-apple feedback this week).

The file header in src/loop/errors.ts even lied about it:

/** Single text-layer DeepSeek-error formatter — 429/5xx never reach here (retry.ts swallows). */

But retry.ts:50 returns the last response when retries are exhausted — it doesn't swallow. So 5xx did reach formatLoopError and just fell through to the passthrough branch with no friendly message.

What

formatLoopError(err, probe?) now takes an optional reachability probe. The catch handler in loop.ts:step() detects 5xx via is5xxError(), fires a 1.5s /user/balance probe, and passes the result through. Three message shapes:

  • no probe → "DeepSeek service unavailable (503) — DeepSeek-side problem, not Reasonix. Already retried 4×. Try wait 30s / /preset / status page."
  • reachable → adds "main API answered our health check, but /chat/completions is failing — partial outage on their side."
  • unreachable → adds "DeepSeek API is unreachable from your network — could be DS outage or local network. Check your network."

All three name DeepSeek explicitly and link https://status.deepseek.com so users know where to look.

The lying header comment is gone.

Test plan

  • 26 cases in tests/loop-error.test.ts (was 22) — three probe states × three 5xx codes + 503 with/without probe
  • tsc --noEmit clean
  • biome lint clean
  • full suite (2296 passing, one MCP startup flake unrelated to this change, passes on rerun)

Raw `DeepSeek 503: <body>` was bubbling straight to the UI. Users on
flaky DS days couldn't tell if it was DS-side (their service was
overloaded) or Reasonix-side (we crashed). The header comment in
loop/errors.ts even claimed retry.ts swallowed all 5xx — but retry.ts
returns the last response instead of swallowing, so 5xx did reach the
formatter and just fell through to a passthrough branch.

Now formatLoopError takes an optional probe result. The catch in
step() detects 5xx, fires a 1.5s `/user/balance` probe, and passes
the result back. Three message variants:
  - no probe         → DS-side outage notice + retry hints
  - probe reachable  → "main API answered, /chat/completions failing"
  - probe down       → "DS unreachable from your network — check net"

All three include "this is a DeepSeek-side problem, not Reasonix" and
a status-page link so users know where to look.
Comment thread tests/loop-error.test.ts Fixed
Use toContain('status.deepseek.com') instead of /status\.deepseek\.com/
— same assertion semantics, dodges CodeQL's high-severity false positive
on unanchored URL regex.
@esengine esengine merged commit d837dda into main May 8, 2026
3 checks passed
@esengine esengine deleted the fix/loop-deepseek-5xx-friendly branch May 8, 2026 11:23
esengine added a commit that referenced this pull request May 8, 2026
formatLoopError, reasonPrefixFor, and errorLabelFor were all
hardcoded English. A Chinese user hitting a 503 / 401 / context
overflow saw raw English (and on top of that, the new 5xx outage
notice from #440 was also English).

Move the strings to a new errors.* i18n namespace covering:
  - context overflow (with V4/legacy limit mention)
  - 401 auth / 402 balance / 422 param / 400 bad request
  - 5xx head + reachable / unreachable / two action variants
  - reason prefix and label for budget/aborted/context-guard/stuck
  - "(no message)" fallback for empty error bodies

zh-CN translations included. Existing tests still cover EN (vitest
setupFile pins runtime to EN); two new tests flip to zh-CN to confirm
runtime switch actually translates.

Stacked on #440 — merge that first.
ChasLui pushed a commit to ChasLui/DeepSeek-Reasonix that referenced this pull request May 23, 2026
…ine#440)

* fix(loop): friendly DeepSeek 5xx error with reachability probe

Raw `DeepSeek 503: <body>` was bubbling straight to the UI. Users on
flaky DS days couldn't tell if it was DS-side (their service was
overloaded) or Reasonix-side (we crashed). The header comment in
loop/errors.ts even claimed retry.ts swallowed all 5xx — but retry.ts
returns the last response instead of swallowing, so 5xx did reach the
formatter and just fell through to a passthrough branch.

Now formatLoopError takes an optional probe result. The catch in
step() detects 5xx, fires a 1.5s `/user/balance` probe, and passes
the result back. Three message variants:
  - no probe         → DS-side outage notice + retry hints
  - probe reachable  → "main API answered, /chat/completions failing"
  - probe down       → "DS unreachable from your network — check net"

All three include "this is a DeepSeek-side problem, not Reasonix" and
a status-page link so users know where to look.

* test: avoid CodeQL js/regex/missing-regexp-anchor on URL substring match

Use toContain('status.deepseek.com') instead of /status\.deepseek\.com/
— same assertion semantics, dodges CodeQL's high-severity false positive
on unanchored URL regex.
ChasLui pushed a commit to ChasLui/DeepSeek-Reasonix that referenced this pull request May 23, 2026
formatLoopError, reasonPrefixFor, and errorLabelFor were all
hardcoded English. A Chinese user hitting a 503 / 401 / context
overflow saw raw English (and on top of that, the new 5xx outage
notice from esengine#440 was also English).

Move the strings to a new errors.* i18n namespace covering:
  - context overflow (with V4/legacy limit mention)
  - 401 auth / 402 balance / 422 param / 400 bad request
  - 5xx head + reachable / unreachable / two action variants
  - reason prefix and label for budget/aborted/context-guard/stuck
  - "(no message)" fallback for empty error bodies

zh-CN translations included. Existing tests still cover EN (vitest
setupFile pins runtime to EN); two new tests flip to zh-CN to confirm
runtime switch actually translates.

Stacked on esengine#440 — merge that first.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants