Skip to content

[BUG] 5-Hour Rate Limit Exhaustion Accelerating Despite Comparable Workload #38330

@ghost

Description

Preflight Checklist

  • I have searched existing issues and this hasn't been reported yet
  • This is a single bug report (please file separate reports for different bugs)
  • I am using the latest version of Claude Code

What's Wrong?

The 5-hour rate limit is being exhausted 3x faster today compared to previous days with comparable workloads. This represents a service degradation that violates expected usage patterns and contractual rate limit behavior.
Evidence of Acceleration
Date Total Requests Input Tokens 5h Exhaustion Events Time to First Exhaustion
2026-03-21 ~900 ~3M 0 N/A
2026-03-22 ~900 ~3M 0 N/A
2026-03-23 908 3,402,759 0 N/A
2026-03-24 936 (+3.1%) 4,169,220 (+22.5%) 3 ~4.5 hours

Key Finding: A 3-23% workload increase should not cause infinite → 4.5 hour exhaustion time.
Rate Limit Exhaustion Log

2026/03/24 13:39:46 5h=100.0% 7d=15.0% → 429 Too Many Requests
2026/03/24 16:02:48 5h=100.0% 7d=24.0% → 429 Too Many Requests
2026/03/24 16:34:37 5h=100.0% 7d=24.0% → 429 Too Many Requests

5h Window Progression (2026-03-24)
Time (UTC) 5h% Time Since Reset
09:24:19 1.0% —
10:09:42 11.0% —
13:39:46 100.0% 4h 15m
15:11:09 57.0% (reset) —
16:02:48 100.0% 51m

The 5h window reset and re-exhausted in 51 minutes.
Cache Efficiency Degradation
Date Cache Creation Tokens vs Input Ratio
2026-03-23 5,698,495 1.68x
2026-03-24 7,287,796 1.75x
Same workload burning 28% more cache creation tokens, accelerating rate limit exhaustion.
Breach of Service Agreement
Issue: Unilateral Service Degradation

The acceleration of rate limit exhaustion represents a material change to service behavior without:

Advance notice to users
Documentation updates
Corresponding workload increase to justify the change

Quantified Impact
Metric 2026-03-23 2026-03-24 Delta
5h exhaustions 0 3 ∞ increase
Hours of service ~24 ~7 (cumulative) -71%
Cache creation tokens 5.7M 7.3M +28%
Cost efficiency Baseline Degraded Unknown excess cost
Request for Recompensation

We request the following:

Credit for service downtime: 3 complete 5h window exhaustions = ~15 hours of unavailable service capacity

Refund for excess token burn: 7.3M vs expected ~6M cache creation tokens = ~1.3M excess tokens at published rates

Investigation commitment: Acknowledgment of this issue and timeline for fix

Transparency: Explanation of what changed between 2026-03-23 and 2026-03-24 to cause this acceleration

What Should Happen?

Expected Behavior (Contractual)

Consistent Rate Limit Behavior: The 5h rate limit should exhibit consistent exhaustion timing for comparable workloads, absent documented service changes.

Documented Thresholds: Rate limit behavior should match published documentation. Acceleration of ~28% in token burn rate without corresponding workload increase suggests a service-side issue.

Graceful Degradation: If rate limits change, users should receive advance notice or documentation updates.

Actual Behavior (Breach)
Aspect Expected Actual
Exhaustion timing ~12-24h for this workload 4.5h, then 51m
Consistency day-over-day Similar exhaustion pattern 0 → 3 exhaustions
Cache efficiency Stable ratio to input tokens +28% increase

Error Messages/Logs

2026/03/24 13:39:46. [UPSTREAM] /v1/messages 429 Too Many Requests
{"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your account's rate limit. Please try again later."},"request_id":"req_"}


2026/03/24 16:02:48. [UPSTREAM] /v1/messages 429 Too Many Requests
{"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your account's rate limit. Please try again later."},"request_id":"req_"}


2026/03/24 16:34:37. [UPSTREAM] /v1/messages 429 Too Many Requests
{"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your account's rate limit. Please try again later."},"request_id":"req_"}

Steps to Reproduce

Run Claude Code with standard workload (~900 requests, ~3-4M input tokens per day)
Observe 5h rate limit progression via response headers or proxy logging
Compare day-over-day exhaustion timing

Baseline Day (2026-03-23):

908 requests
3.4M input tokens
5.7M cache creation tokens
Result: 0 exhaustions

Degraded Day (2026-03-24):

936 requests (+3%)
4.2M input tokens (+22%)
7.3M cache creation tokens (+28%)
Result: 3 exhaustions in 7 hours

Claude Model

Not sure / Multiple models

Is this a regression?

Yes, this worked in a previous version

Last Working Version

No response

Claude Code Version

666

Platform

Anthropic API

Operating System

Ubuntu/Debian Linux

Terminal/Shell

Xterm

Additional Information

This is systemic and currently in progress. We the users demand compensation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    invalidIssue doesn't seem to be related to Claude CodestaleIssue is inactive

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions