Skip to content

feat(core): handle TPM throttling errors in stream retries#3

Merged
wenshao merged 4 commits into
wenshao:feat/tpm-throttling-retryfrom
QwenLM:feat/tpm-throttling-retry-wenshao
Feb 11, 2026
Merged

feat(core): handle TPM throttling errors in stream retries#3
wenshao merged 4 commits into
wenshao:feat/tpm-throttling-retryfrom
QwenLM:feat/tpm-throttling-retry-wenshao

Conversation

@yiliang114

Copy link
Copy Markdown

Summary

This PR implements handling for TPM (Tokens Per Minute) throttling errors that are returned as stream content instead of HTTP errors. Some OpenAI-compatible providers return throttling errors as SSE chunks with `finish_reason="error_finish"` and the error message in `delta.content`.

Problem

When using Qwen Code, users frequently encounter TPM rate limit errors. The current error handling mechanism cannot properly identify and handle these errors:

  1. Errors in stream content are ignored: Some OpenAI-compatible providers (e.g., certain proxy services) return rate limit errors as regular SSE chunks with `finish_reason="error_finish"` and error messages in `delta.content`, instead of returning HTTP 429 status codes

  2. Retry logic doesn't work: The existing `retryWithBackoff` can only handle HTTP-level errors, not rate limit errors embedded in stream content

  3. Poor user experience: When encountering rate limits, the request fails directly without automatic retry, requiring users to manually wait and resend

Solution

1. Error Detection (pipeline.ts)

Added `StreamContentError` class to detect errors returned as stream content:

```typescript
// Detect chunks with finish_reason="error_finish"
if ((chunk.choices?.[0]?.finish_reason as string) === 'error_finish') {
const errorContent = chunk.choices?.[0]?.delta?.content?.trim();
throw new StreamContentError(errorContent);
}
```

2. Stream Retry (geminiChat.ts)

Added TPM throttling-specific retry logic in `sendMessageStream`:

  • Independent counter: TPM retries use a separate `tpmRetryCount` that doesn't consume normal content retry attempts
  • Fixed delay: Uses a 60-second fixed delay (TPM quotas reset on a 1-minute rolling window)
  • User feedback: Retries notify the UI via `StreamEventType.RETRY` events to display retry status

3. Error Recognition (retry.ts)

Refactored `isTPMThrottlingError` function to reuse existing type guard utilities:

```typescript
export function isTPMThrottlingError(error: unknown): boolean {
const checkMessage = (msg: string) => msg.includes('Throttling: TPM(');

if (typeof error === 'string') return checkMessage(error);
if (isStructuredError(error)) return checkMessage(error.message);
if (isApiError(error)) return checkMessage(error.error.message);

return false;
}
```

4. Bug Fix: Prioritized TPM Check

Fixed a critical issue where TPM errors without `status=429` would be incorrectly rejected by `shouldRetryOnError`. Moved the TPM check to occur before the `shouldRetryOnError` check.

Retry Flow

```
User sends message

API returns TPM rate limit error (finish_reason="error_finish")

StreamContentError is thrown

isTPMThrottlingError identifies error type

Wait 60 seconds (UI shows "Retrying...")

Reset delay counter, resend request

Successfully get response
```

Technical Details

  • Why 60-second delay: TPM quotas are calculated on a 1-minute rolling window. 60 seconds ensures the window fully rotates and quota is released
  • Why independent counter: Prevents TPM retries from consuming retry attempts meant for normal content errors (e.g., format errors)
  • Backward compatible: Does not affect existing HTTP-level error handling logic

Test Coverage

Unit Tests (retry.test.ts)

  • TPM error detection from string, Error object, and nested error object
  • 1-minute wait for TPM throttling errors
  • Exponential backoff reset after TPM error
  • TPM error without status property (edge case)
  • Nested TPM error object without top-level status (edge case)
  • Consecutive TPM throttling errors
  • Max attempts exhaustion for TPM errors

Unit Tests (pipeline.test.ts)

  • `StreamContentError` thrown when stream chunk contains `error_finish`

Unit Tests (geminiChat.test.ts)

  • TPM throttling `StreamContentError` retry with fixed delay

Changed Files

File Changes
`packages/core/src/core/geminiChat.ts` Added TPM throttling retry logic in stream
`packages/core/src/core/openaiContentGenerator/pipeline.ts` Added `StreamContentError` class
`packages/core/src/utils/retry.ts` Refactored `isTPMThrottlingError`, prioritized TPM check
`packages/core/src/utils/retry.test.ts` Added comprehensive unit tests
`packages/core/src/core/geminiChat.test.ts` Added TPM stream retry test
`packages/core/src/core/openaiContentGenerator/pipeline.test.ts` Added `StreamContentError` test

- Remove redundant error checking logic in isTPMThrottlingError function
- Reuse isStructuredError and isApiError utilities from quotaErrorDetection module
- Clean up duplicate import statements
- Move TPM throttling check before shouldRetryOnError to ensure TPM errors
  without standard HTTP status codes are still retried
- Add comprehensive unit tests for edge cases:
  - TPM error without status property
  - Nested TPM error object without top-level status
  - Consecutive TPM throttling errors
  - Max attempts exhaustion for TPM errors
- Change 'as' to 'as unknown as' for proper type casting
@yiliang114

Copy link
Copy Markdown
Author

During local simulation of a throttling event (TPM 12231856/10000000, HTTP 429), the error is gracefully handled in the background. End users in the TUI will not experience immediate disruption or error notifications. With debug logging enabled, these throttling events are recorded in the log files for operational visibility and diagnostics.

The left side shows the local proxy tool simulating a TPM throttling error. The right side shows the output from the local Qwen Code CLI. Below is the debug log file.

image image

@wenshao wenshao merged commit f8d914b into wenshao:feat/tpm-throttling-retry Feb 11, 2026
wenshao added a commit that referenced this pull request Mar 11, 2026
- Add constant-time token comparison via crypto.timingSafeEqual (QwenLM#6)
- Validate lock file fields before trusting parsed JSON (#4)
- Verify daemon identity via /health API before sending SIGTERM (QwenLM#5)
- Add session idle timeout (30min) to auto-cleanup unused sessions (#1)
- Reject concurrent prompts on same session instead of overwriting (QwenLM#8)
- Add max session limit (50) to prevent resource exhaustion (QwenLM#7)
- Use server.closeAllConnections() for prompt stop() resolution (QwenLM#15)
- Register onStop callback in foreground mode (QwenLM#10)
- Fix unhandled promise in onStop callback with void (#3)
- Respect encoding parameter in captureWrite (QwenLM#14)
- Remove unnecessary env spread in fork options (QwenLM#9)
- Add tests for lock file validation and session page serving

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
wenshao added a commit that referenced this pull request May 7, 2026
…e context flag

Five review findings on PR QwenLM#3919:

1. **Compact mode bypassed the scrollback summary** (gpt-5.5 via
   /qreview, ToolGroupMessage:324). `ToolGroupMessage` returns
   `CompactToolGroupDisplay` before the ToolMessage path when
   `compactMode === true`, so the new `isPending` gate on
   `SubagentExecutionRenderer` only protected the expanded path —
   committed terminal subagents in compact mode never reached
   `SubagentScrollbackSummary` and the LiveAgentPanel → committed-
   summary handoff broke for users who turned compact mode on.

   Force-expand the group when `!isPending` AND any tool call has a
   terminal `task_execution` resultDisplay. Stay compact while the
   parent turn is still live (`isPending`) — the panel below the
   composer owns that surface and an inline summary would
   duplicate it. Coverage: 4 new ToolGroupMessage cases (compact +
   completed-committed expands; compact + running-live stays compact;
   compact + completed-live stays compact; compact + failed-committed
   expands).

2. **Snapshot-coupled comment in `packages/core`** (Copilot,
   background-tasks.ts:292). The comment named CLI/UI consumers
   (`useBackgroundTaskView`, `BackgroundTasksDialog`) and asserted
   React batching guarantees from a core file. Reword to
   "snapshot-style consumers that re-pull `getAll()` from inside
   the callback" and drop the framework-specific batching claim.

3. **Two-phase emit needed an explicit signal** (Copilot,
   background-tasks.ts:283). Emitting `statusChange` twice without
   distinguishing the phases forced consumers to either do
   duplicate work or risk persisting a stale `entry` from the
   second callback. Add an optional second arg
   `context?: { removed?: boolean }` to
   `BackgroundStatusChangeCallback`; the post-delete emit passes
   `{ removed: true }` so consumers can disambiguate without
   re-querying the registry. Backwards compatible — existing
   callbacks ignore the new arg. Tests updated to assert both
   `mock.calls[0][1] === undefined` and
   `mock.calls[1][1] === { removed: true }`.

4. **`isPending` doc clarified** (Copilot, ToolMessage.tsx:507).
   Made the default semantics explicit: omitted/undefined is
   treated as committed (not pending); live-area renderers MUST
   pass `true` explicitly to suppress the scrollback summary.

5. (4 of the threads were duplicate Copilot fires of #2 + #3.)

Coverage: 219 test files / 3369 passing across cli/ui + core/agents.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants