Preflight Checklist
What's Wrong?
The 5-hour rate limit is being exhausted 3x faster today compared to previous days with comparable workloads. This represents a service degradation that violates expected usage patterns and contractual rate limit behavior.
Evidence of Acceleration
Date Total Requests Input Tokens 5h Exhaustion Events Time to First Exhaustion
2026-03-21 ~900 ~3M 0 N/A
2026-03-22 ~900 ~3M 0 N/A
2026-03-23 908 3,402,759 0 N/A
2026-03-24 936 (+3.1%) 4,169,220 (+22.5%) 3 ~4.5 hours
Key Finding: A 3-23% workload increase should not cause infinite → 4.5 hour exhaustion time.
Rate Limit Exhaustion Log
2026/03/24 13:39:46 5h=100.0% 7d=15.0% → 429 Too Many Requests
2026/03/24 16:02:48 5h=100.0% 7d=24.0% → 429 Too Many Requests
2026/03/24 16:34:37 5h=100.0% 7d=24.0% → 429 Too Many Requests
5h Window Progression (2026-03-24)
Time (UTC) 5h% Time Since Reset
09:24:19 1.0% —
10:09:42 11.0% —
13:39:46 100.0% 4h 15m
15:11:09 57.0% (reset) —
16:02:48 100.0% 51m
The 5h window reset and re-exhausted in 51 minutes.
Cache Efficiency Degradation
Date Cache Creation Tokens vs Input Ratio
2026-03-23 5,698,495 1.68x
2026-03-24 7,287,796 1.75x
Same workload burning 28% more cache creation tokens, accelerating rate limit exhaustion.
Breach of Service Agreement
Issue: Unilateral Service Degradation
The acceleration of rate limit exhaustion represents a material change to service behavior without:
Advance notice to users
Documentation updates
Corresponding workload increase to justify the change
Quantified Impact
Metric 2026-03-23 2026-03-24 Delta
5h exhaustions 0 3 ∞ increase
Hours of service ~24 ~7 (cumulative) -71%
Cache creation tokens 5.7M 7.3M +28%
Cost efficiency Baseline Degraded Unknown excess cost
Request for Recompensation
We request the following:
Credit for service downtime: 3 complete 5h window exhaustions = ~15 hours of unavailable service capacity
Refund for excess token burn: 7.3M vs expected ~6M cache creation tokens = ~1.3M excess tokens at published rates
Investigation commitment: Acknowledgment of this issue and timeline for fix
Transparency: Explanation of what changed between 2026-03-23 and 2026-03-24 to cause this acceleration
What Should Happen?
Expected Behavior (Contractual)
Consistent Rate Limit Behavior: The 5h rate limit should exhibit consistent exhaustion timing for comparable workloads, absent documented service changes.
Documented Thresholds: Rate limit behavior should match published documentation. Acceleration of ~28% in token burn rate without corresponding workload increase suggests a service-side issue.
Graceful Degradation: If rate limits change, users should receive advance notice or documentation updates.
Actual Behavior (Breach)
Aspect Expected Actual
Exhaustion timing ~12-24h for this workload 4.5h, then 51m
Consistency day-over-day Similar exhaustion pattern 0 → 3 exhaustions
Cache efficiency Stable ratio to input tokens +28% increase
Error Messages/Logs
2026/03/24 13:39:46. [UPSTREAM] /v1/messages 429 Too Many Requests
{"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your account's rate limit. Please try again later."},"request_id":"req_"}
2026/03/24 16:02:48. [UPSTREAM] /v1/messages 429 Too Many Requests
{"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your account's rate limit. Please try again later."},"request_id":"req_"}
2026/03/24 16:34:37. [UPSTREAM] /v1/messages 429 Too Many Requests
{"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your account's rate limit. Please try again later."},"request_id":"req_"}
Steps to Reproduce
Run Claude Code with standard workload (~900 requests, ~3-4M input tokens per day)
Observe 5h rate limit progression via response headers or proxy logging
Compare day-over-day exhaustion timing
Baseline Day (2026-03-23):
908 requests
3.4M input tokens
5.7M cache creation tokens
Result: 0 exhaustions
Degraded Day (2026-03-24):
936 requests (+3%)
4.2M input tokens (+22%)
7.3M cache creation tokens (+28%)
Result: 3 exhaustions in 7 hours
Claude Model
Not sure / Multiple models
Is this a regression?
Yes, this worked in a previous version
Last Working Version
No response
Claude Code Version
666
Platform
Anthropic API
Operating System
Ubuntu/Debian Linux
Terminal/Shell
Xterm
Additional Information
This is systemic and currently in progress. We the users demand compensation.
Preflight Checklist
What's Wrong?
The 5-hour rate limit is being exhausted 3x faster today compared to previous days with comparable workloads. This represents a service degradation that violates expected usage patterns and contractual rate limit behavior.
Evidence of Acceleration
Date Total Requests Input Tokens 5h Exhaustion Events Time to First Exhaustion
2026-03-21 ~900 ~3M 0 N/A
2026-03-22 ~900 ~3M 0 N/A
2026-03-23 908 3,402,759 0 N/A
2026-03-24 936 (+3.1%) 4,169,220 (+22.5%) 3 ~4.5 hours
Key Finding: A 3-23% workload increase should not cause infinite → 4.5 hour exhaustion time.
Rate Limit Exhaustion Log
2026/03/24 13:39:46 5h=100.0% 7d=15.0% → 429 Too Many Requests
2026/03/24 16:02:48 5h=100.0% 7d=24.0% → 429 Too Many Requests
2026/03/24 16:34:37 5h=100.0% 7d=24.0% → 429 Too Many Requests
5h Window Progression (2026-03-24)
Time (UTC) 5h% Time Since Reset
09:24:19 1.0% —
10:09:42 11.0% —
13:39:46 100.0% 4h 15m
15:11:09 57.0% (reset) —
16:02:48 100.0% 51m
The 5h window reset and re-exhausted in 51 minutes.
Cache Efficiency Degradation
Date Cache Creation Tokens vs Input Ratio
2026-03-23 5,698,495 1.68x
2026-03-24 7,287,796 1.75x
Same workload burning 28% more cache creation tokens, accelerating rate limit exhaustion.
Breach of Service Agreement
Issue: Unilateral Service Degradation
The acceleration of rate limit exhaustion represents a material change to service behavior without:
Quantified Impact
Metric 2026-03-23 2026-03-24 Delta
5h exhaustions 0 3 ∞ increase
Hours of service ~24 ~7 (cumulative) -71%
Cache creation tokens 5.7M 7.3M +28%
Cost efficiency Baseline Degraded Unknown excess cost
Request for Recompensation
We request the following:
What Should Happen?
Expected Behavior (Contractual)
Actual Behavior (Breach)
Aspect Expected Actual
Exhaustion timing ~12-24h for this workload 4.5h, then 51m
Consistency day-over-day Similar exhaustion pattern 0 → 3 exhaustions
Cache efficiency Stable ratio to input tokens +28% increase
Error Messages/Logs
2026/03/24 13:39:46. [UPSTREAM] /v1/messages 429 Too Many Requests {"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your account's rate limit. Please try again later."},"request_id":"req_"} 2026/03/24 16:02:48. [UPSTREAM] /v1/messages 429 Too Many Requests {"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your account's rate limit. Please try again later."},"request_id":"req_"} 2026/03/24 16:34:37. [UPSTREAM] /v1/messages 429 Too Many Requests {"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your account's rate limit. Please try again later."},"request_id":"req_"}Steps to Reproduce
Run Claude Code with standard workload (~900 requests, ~3-4M input tokens per day)
Observe 5h rate limit progression via response headers or proxy logging
Compare day-over-day exhaustion timing
Baseline Day (2026-03-23):
Degraded Day (2026-03-24):
Claude Model
Not sure / Multiple models
Is this a regression?
Yes, this worked in a previous version
Last Working Version
No response
Claude Code Version
666
Platform
Anthropic API
Operating System
Ubuntu/Debian Linux
Terminal/Shell
Xterm
Additional Information
This is systemic and currently in progress. We the users demand compensation.