feat(core): handle TPM throttling errors in stream retries#3
Merged
wenshao merged 4 commits intoFeb 11, 2026
Merged
Conversation
- Remove redundant error checking logic in isTPMThrottlingError function - Reuse isStructuredError and isApiError utilities from quotaErrorDetection module - Clean up duplicate import statements
- Move TPM throttling check before shouldRetryOnError to ensure TPM errors without standard HTTP status codes are still retried - Add comprehensive unit tests for edge cases: - TPM error without status property - Nested TPM error object without top-level status - Consecutive TPM throttling errors - Max attempts exhaustion for TPM errors
- Change 'as' to 'as unknown as' for proper type casting
Author
wenshao
added a commit
that referenced
this pull request
Mar 11, 2026
- Add constant-time token comparison via crypto.timingSafeEqual (QwenLM#6) - Validate lock file fields before trusting parsed JSON (#4) - Verify daemon identity via /health API before sending SIGTERM (QwenLM#5) - Add session idle timeout (30min) to auto-cleanup unused sessions (#1) - Reject concurrent prompts on same session instead of overwriting (QwenLM#8) - Add max session limit (50) to prevent resource exhaustion (QwenLM#7) - Use server.closeAllConnections() for prompt stop() resolution (QwenLM#15) - Register onStop callback in foreground mode (QwenLM#10) - Fix unhandled promise in onStop callback with void (#3) - Respect encoding parameter in captureWrite (QwenLM#14) - Remove unnecessary env spread in fork options (QwenLM#9) - Add tests for lock file validation and session page serving Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
wenshao
added a commit
that referenced
this pull request
May 7, 2026
…e context flag Five review findings on PR QwenLM#3919: 1. **Compact mode bypassed the scrollback summary** (gpt-5.5 via /qreview, ToolGroupMessage:324). `ToolGroupMessage` returns `CompactToolGroupDisplay` before the ToolMessage path when `compactMode === true`, so the new `isPending` gate on `SubagentExecutionRenderer` only protected the expanded path — committed terminal subagents in compact mode never reached `SubagentScrollbackSummary` and the LiveAgentPanel → committed- summary handoff broke for users who turned compact mode on. Force-expand the group when `!isPending` AND any tool call has a terminal `task_execution` resultDisplay. Stay compact while the parent turn is still live (`isPending`) — the panel below the composer owns that surface and an inline summary would duplicate it. Coverage: 4 new ToolGroupMessage cases (compact + completed-committed expands; compact + running-live stays compact; compact + completed-live stays compact; compact + failed-committed expands). 2. **Snapshot-coupled comment in `packages/core`** (Copilot, background-tasks.ts:292). The comment named CLI/UI consumers (`useBackgroundTaskView`, `BackgroundTasksDialog`) and asserted React batching guarantees from a core file. Reword to "snapshot-style consumers that re-pull `getAll()` from inside the callback" and drop the framework-specific batching claim. 3. **Two-phase emit needed an explicit signal** (Copilot, background-tasks.ts:283). Emitting `statusChange` twice without distinguishing the phases forced consumers to either do duplicate work or risk persisting a stale `entry` from the second callback. Add an optional second arg `context?: { removed?: boolean }` to `BackgroundStatusChangeCallback`; the post-delete emit passes `{ removed: true }` so consumers can disambiguate without re-querying the registry. Backwards compatible — existing callbacks ignore the new arg. Tests updated to assert both `mock.calls[0][1] === undefined` and `mock.calls[1][1] === { removed: true }`. 4. **`isPending` doc clarified** (Copilot, ToolMessage.tsx:507). Made the default semantics explicit: omitted/undefined is treated as committed (not pending); live-area renderers MUST pass `true` explicitly to suppress the scrollback summary. 5. (4 of the threads were duplicate Copilot fires of #2 + #3.) Coverage: 219 test files / 3369 passing across cli/ui + core/agents.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Summary
This PR implements handling for TPM (Tokens Per Minute) throttling errors that are returned as stream content instead of HTTP errors. Some OpenAI-compatible providers return throttling errors as SSE chunks with `finish_reason="error_finish"` and the error message in `delta.content`.
Problem
When using Qwen Code, users frequently encounter TPM rate limit errors. The current error handling mechanism cannot properly identify and handle these errors:
Errors in stream content are ignored: Some OpenAI-compatible providers (e.g., certain proxy services) return rate limit errors as regular SSE chunks with `finish_reason="error_finish"` and error messages in `delta.content`, instead of returning HTTP 429 status codes
Retry logic doesn't work: The existing `retryWithBackoff` can only handle HTTP-level errors, not rate limit errors embedded in stream content
Poor user experience: When encountering rate limits, the request fails directly without automatic retry, requiring users to manually wait and resend
Solution
1. Error Detection (pipeline.ts)
Added `StreamContentError` class to detect errors returned as stream content:
```typescript
// Detect chunks with finish_reason="error_finish"
if ((chunk.choices?.[0]?.finish_reason as string) === 'error_finish') {
const errorContent = chunk.choices?.[0]?.delta?.content?.trim();
throw new StreamContentError(errorContent);
}
```
2. Stream Retry (geminiChat.ts)
Added TPM throttling-specific retry logic in `sendMessageStream`:
3. Error Recognition (retry.ts)
Refactored `isTPMThrottlingError` function to reuse existing type guard utilities:
```typescript
export function isTPMThrottlingError(error: unknown): boolean {
const checkMessage = (msg: string) => msg.includes('Throttling: TPM(');
if (typeof error === 'string') return checkMessage(error);
if (isStructuredError(error)) return checkMessage(error.message);
if (isApiError(error)) return checkMessage(error.error.message);
return false;
}
```
4. Bug Fix: Prioritized TPM Check
Fixed a critical issue where TPM errors without `status=429` would be incorrectly rejected by `shouldRetryOnError`. Moved the TPM check to occur before the `shouldRetryOnError` check.
Retry Flow
```
User sends message
↓
API returns TPM rate limit error (finish_reason="error_finish")
↓
StreamContentError is thrown
↓
isTPMThrottlingError identifies error type
↓
Wait 60 seconds (UI shows "Retrying...")
↓
Reset delay counter, resend request
↓
Successfully get response
```
Technical Details
Test Coverage
Unit Tests (retry.test.ts)
Unit Tests (pipeline.test.ts)
Unit Tests (geminiChat.test.ts)
Changed Files