Scope: follow-up to #4345 (auto-compaction three-tier ladder).
The hard-rescue mechanism in sendMessageStream fires when effectiveTokens >= hard to force compaction before the API would reject the prompt. Without a retry bound, a session whose history can't shrink (model consistently produces unusable summaries, network broken, compressible slice too small to split, etc.) fires hard-rescue on every send forever — each send burns one compression side-query.
Reactive overflow at the API layer catches the actual failure, so this is an operational cost (~2-5s latency per failed rescue, 1 side-query per send) not a correctness bug. But the cost is bounded by reactive overflow's own retries, not by the rescue itself.
Proposed fix
Add hardRescueFailureCount field to GeminiChat with pessimistic increment pattern:
// pre-call
this.consecutiveFailures = 0; // rescue overrides the cheap-gate breaker
this.hardRescueFailureCount += 1; // pessimistic strike
compressionInfo = await this.tryCompress(prompt_id, model, true, ...);
// post-call: refund NOOP (history-too-small ≠ broken mechanism)
if (compressionInfo.compressionStatus === CompressionStatus.NOOP) {
this.hardRescueFailureCount = Math.max(0, this.hardRescueFailureCount - 1);
}
// COMPRESSED success resets the counter via tryCompress's success branch
Bound rescue trigger by hardRescueFailureCount < MAX_CONSECUTIVE_FAILURES (3). After 3 strikes, reactive overflow becomes the sole defence layer.
Why pessimistic, not post-call increment
Post-call only-on-failure-status leaks two failure shapes silently:
- throws (provider 5xx / abort) → post-handler unreachable → strike not recorded
- NOOP (curated history empty / MIN_COMPRESSION_FRACTION undercut) → neither success nor failure-status branch matches → strike not recorded
Pessimistic guarantees the strike sticks for every non-COMPRESSED outcome, then NOOP is the only one that gets refunded (because NOOP isn't a mechanism failure).
Related
Scope: follow-up to #4345 (auto-compaction three-tier ladder).
The hard-rescue mechanism in
sendMessageStreamfires wheneffectiveTokens >= hardto force compaction before the API would reject the prompt. Without a retry bound, a session whose history can't shrink (model consistently produces unusable summaries, network broken, compressible slice too small to split, etc.) fires hard-rescue on every send forever — each send burns one compression side-query.Reactive overflow at the API layer catches the actual failure, so this is an operational cost (~2-5s latency per failed rescue, 1 side-query per send) not a correctness bug. But the cost is bounded by reactive overflow's own retries, not by the rescue itself.
Proposed fix
Add
hardRescueFailureCountfield toGeminiChatwith pessimistic increment pattern:Bound rescue trigger by
hardRescueFailureCount < MAX_CONSECUTIVE_FAILURES (3). After 3 strikes, reactive overflow becomes the sole defence layer.Why pessimistic, not post-call increment
Post-call only-on-failure-status leaks two failure shapes silently:
Pessimistic guarantees the strike sticks for every non-COMPRESSED outcome, then NOOP is the only one that gets refunded (because NOOP isn't a mechanism failure).
Related
pr-4168-archive-pre-revert). The pessimistic-increment + NOOP-refund pattern is the post-R9 settled shape.