[BUG] 5-Hour Rate Limit Exhaustion Accelerating Despite Comparable Workload

### Preflight Checklist

- [x] I have searched [existing issues](https://github.com/anthropics/claude-code/issues?q=is%3Aissue%20state%3Aopen%20label%3Abug) and this hasn't been reported yet
- [x] This is a single bug report (please file separate reports for different bugs)
- [x] I am using the latest version of Claude Code

### What's Wrong?

The 5-hour rate limit is being exhausted 3x faster today compared to previous days with comparable workloads. This represents a service degradation that violates expected usage patterns and contractual rate limit behavior.
Evidence of Acceleration
Date 	Total Requests 	Input Tokens 	5h Exhaustion Events 	Time to First Exhaustion
2026-03-21 	~900 	~3M 	0 	N/A
2026-03-22 	~900 	~3M 	0 	N/A
2026-03-23 	908 	3,402,759 	0 	N/A
2026-03-24 	936 (+3.1%) 	4,169,220 (+22.5%) 	3 	~4.5 hours

Key Finding: A 3-23% workload increase should not cause infinite → 4.5 hour exhaustion time.
Rate Limit Exhaustion Log

2026/03/24 13:39:46  5h=100.0%  7d=15.0%  → 429 Too Many Requests 
2026/03/24 16:02:48  5h=100.0%  7d=24.0%  → 429 Too Many Requests 
2026/03/24 16:34:37  5h=100.0%  7d=24.0%  → 429 Too Many Requests  

5h Window Progression (2026-03-24)
Time (UTC) 	5h% 	Time Since Reset
09:24:19 	1.0% 	—
10:09:42 	11.0% 	—
13:39:46 	100.0% 	4h 15m
15:11:09 	57.0% (reset) 	—
16:02:48 	100.0% 	51m

The 5h window reset and re-exhausted in 51 minutes.
Cache Efficiency Degradation
Date 	Cache Creation Tokens 	vs Input Ratio
2026-03-23 	5,698,495 	1.68x
2026-03-24 	7,287,796 	1.75x
Same workload burning 28% more cache creation tokens, accelerating rate limit exhaustion.
Breach of Service Agreement
Issue: Unilateral Service Degradation

The acceleration of rate limit exhaustion represents a material change to service behavior without:

    Advance notice to users
    Documentation updates
    Corresponding workload increase to justify the change

Quantified Impact
Metric 	2026-03-23 	2026-03-24 	Delta
5h exhaustions 	0 	3 	∞ increase
Hours of service 	~24 	~7 (cumulative) 	-71%
Cache creation tokens 	5.7M 	7.3M 	+28%
Cost efficiency 	Baseline 	Degraded 	Unknown excess cost
Request for Recompensation

We request the following:

    Credit for service downtime: 3 complete 5h window exhaustions = ~15 hours of unavailable service capacity

    Refund for excess token burn: 7.3M vs expected ~6M cache creation tokens = ~1.3M excess tokens at published rates

    Investigation commitment: Acknowledgment of this issue and timeline for fix

    Transparency: Explanation of what changed between 2026-03-23 and 2026-03-24 to cause this acceleration


### What Should Happen?

Expected Behavior (Contractual)

    Consistent Rate Limit Behavior: The 5h rate limit should exhibit consistent exhaustion timing for comparable workloads, absent documented service changes.

    Documented Thresholds: Rate limit behavior should match published documentation. Acceleration of ~28% in token burn rate without corresponding workload increase suggests a service-side issue.

    Graceful Degradation: If rate limits change, users should receive advance notice or documentation updates.

Actual Behavior (Breach)
Aspect 	Expected 	Actual
Exhaustion timing 	~12-24h for this workload 	4.5h, then 51m
Consistency day-over-day 	Similar exhaustion pattern 	0 → 3 exhaustions
Cache efficiency 	Stable ratio to input tokens 	+28% increase

### Error Messages/Logs

```shell
2026/03/24 13:39:46. [UPSTREAM] /v1/messages 429 Too Many Requests
{"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your account's rate limit. Please try again later."},"request_id":"req_"}


2026/03/24 16:02:48. [UPSTREAM] /v1/messages 429 Too Many Requests
{"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your account's rate limit. Please try again later."},"request_id":"req_"}


2026/03/24 16:34:37. [UPSTREAM] /v1/messages 429 Too Many Requests
{"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your account's rate limit. Please try again later."},"request_id":"req_"}
```

### Steps to Reproduce

Run Claude Code with standard workload (~900 requests, ~3-4M input tokens per day)
    Observe 5h rate limit progression via response headers or proxy logging
    Compare day-over-day exhaustion timing

Baseline Day (2026-03-23):

    908 requests
    3.4M input tokens
    5.7M cache creation tokens
    Result: 0 exhaustions

Degraded Day (2026-03-24):

    936 requests (+3%)
    4.2M input tokens (+22%)
    7.3M cache creation tokens (+28%)
    Result: 3 exhaustions in 7 hours

### Claude Model

Not sure / Multiple models

### Is this a regression?

Yes, this worked in a previous version

### Last Working Version

_No response_

### Claude Code Version

666

### Platform

Anthropic API

### Operating System

Ubuntu/Debian Linux

### Terminal/Shell

Xterm

### Additional Information

This is systemic and currently in progress. We the users demand compensation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] 5-Hour Rate Limit Exhaustion Accelerating Despite Comparable Workload #38330

Preflight Checklist

What's Wrong?

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] 5-Hour Rate Limit Exhaustion Accelerating Despite Comparable Workload #38330

Description

Preflight Checklist

What's Wrong?

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions