feat(quota): add provider quota enforcement with per-limit status tracking#1518
Conversation
- Add QuotaEnforcementSettings to system service with enabled/mode config - Implement ProviderQuotaSelector to filter exhausted channels at selection time - Add QuotaAwareStrategy applying -10000 penalty for exhausted channels - Wire quota-aware components into all load balancers via fx - Add GraphQL query/mutation for quota enforcement settings control - Return HTTP 503 with quota_exhausted error code in API handlers - Add in-memory quota status cache using sync.Map for O(1) lookups - Add comprehensive tests for quota selector and scoring strategy
[R1] Only return QuotaExhaustedError when quota filtering caused empty candidates (ProviderQuotaSelector.FilteredCount > 0), preventing wrong HTTP 503 for non-quota failures like model-not-found. [D1] Extract writeQuotaExhaustedResponse helper, replacing 5 identical copy-pasted quota error blocks across chat.go, doubao.go, openai.go. [D2] Remove unreachable quota-exhausted checks after ReadHTTPRequest in doubao.go and openai.go (ReadHTTPRequest only returns io.ReadAll errors, never QuotaExhaustedError). [R5] Rewrite quota test to exercise actual production code (writeQuotaExhaustedResponse) instead of manually reimplementing handler logic that would pass even if handler broke. [S1] Define QuotaEnforcementMode typed constants in backend (ExhaustedOnly, DePrioritize), replacing 40+ raw string literals across system.go, orchestrator, resolvers, and tests. [S1] Type QuotaChannelStatus.Status as providerquotastatus.Status instead of bare string, reusing existing typed enum. [S2] Extract hardcoded 0.8 warning usage ratio into named constant (warningUsageRatio) in lb_strategy_quota.go. [S1] Define QuotaEnforcementMode union type in frontend (TypeScript), applied to QuotaEnforcementSettings and UpdateQuotaEnforcementSettingsInput. [S3] Remove dead 'exhausted_only' fallback and 'value &&' guard in quota-settings.tsx Select component.
Negate de-prioritize score so warning channels rank below exhausted ones. Add validation rejecting invalid quota enforcement modes. Fix playground error handling: return 503 for quota-exhausted errors instead of falling through to generic HTTP error handling. Update quota enforcement settings resolver to use the non-default variant so errors surface instead of silently returning zero values. Clean up stale .playwright-cli/ test artifacts and add to .gitignore.
Populate per-limit QuotaLimitStatus in all 7 checkers based on request modality (req.Image != nil for image vs token types). Add NextCheckAt = checkInterval/4 for warning channels. Propagate quota limit type via context keys. Score and filter providers per-limit using per-limit effective status evaluation.
QuotaEnforcementMode was missing MarshalGQL causing gqlgen runtime type assertion error when serializing the mode field. Converted mode from GraphQL String to proper enum type with MarshalGQL/UnmarshalGQL methods and updated frontend to use SCREAMING_SNAKE_CASE values.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
There was a problem hiding this comment.
Code Review
This pull request introduces a system-wide quota enforcement mechanism, allowing AxonHub to manage channels with exhausted provider quotas through either strict filtering or de-prioritization. Key additions include a new "Quota" settings UI, modality-specific (image vs. token) quota tracking across various provider checkers, and an in-memory status cache to optimize routing decisions. Feedback focuses on refining the DE_PRIORITIZE mode to ensure exhausted channels remain available as a last resort, correcting status ranking logic, optimizing service startup by loading the quota cache asynchronously, and maintaining modality-specific limits during error scenarios.
…istency - Fix inverted warning penalty formula in QuotaAwareStrategy (1-usageRatio → usageRatio) - Fix inverted UsageRatio in NeuralWatt checker (remaining → used fraction) - Fix usageRatio=0.0 when exhausted in GitHub Copilot checker (default to 1.0) - Fix PercentUsed≥1.0 mapping to warning instead of exhausted in NanoGPT checker - Fix UsageRatio=0 when Codex limit exhausted without UsedPercent - Fix saveQuotaError wiping per-limit cache on transient failures - Wire warning_check_interval_ratio config through DI instead of hardcoding /4 - Fix EffectiveStatus returning Available for all-Unknown limits - Return error from UnmarshalJSON on invalid QuotaEnforcementMode - Replace string literals with typed status constants in orchestrator - Remove dead IsNotFound check in QuotaEnforcementSettingsOrDefault - Add per-limit exhausted/edge-case tests for NanoGPT - Add multi-ratio coverage and penalty-increase test for LB strategy - Add cache round-trip test for Limits and EffectiveStatus unknown test
…t EffectiveStatus init - ProviderQuotaSelector: skip filtering in DePrioritize mode so QuotaAwareStrategy can penalize exhausted channels instead of removing them, preserving the soft enforcement semantic - EffectiveStatus: initialize worstStatus to StatusUnknown and worstReady to false so that Unknown-ranked limits can be overridden by any known status instead of returning Available incorrectly - Start: run loadQuotaCache async to avoid blocking startup on large quota datasets - Update tests to expect all candidates in DePrioritize mode
Reduce the blind spot between quota checks so that channels in warning state are detected sooner. At 20m, a Codex channel at 90% usage on the 3h primary window could fully exhaust before the next check. At 5m, the worst-case blind spot is under 30% of remaining quota even for the shortest provider windows. The adaptive warning interval (base/4) now checks every ~75s instead of 5m, catching rapid exhaustion quickly once warning is first detected. Also enable refetchIntervalInBackground on the frontend quota query so the UI stays current when the browser tab is backgrounded.
Add channel-level EffectiveStatus floor when base is exhausted - Fallback to "unknown" for non-standard status values for consistency - Lower synthetic warning boundary to prevent false 429 selector picks - Refactor quotaSelector for correct composition order - Add UnmarshalGQL for EffectiveStatus type consistency - Add round-trip test for quota selection logic - Add config documentation for quota warning interval ratio
Refactor API-specific error formatting for quota exhausted errors (extract wrapQuotaExhaustedAsResponseError shared helper). Fix provider_quota warning interval math (multiply vs divide ratio). Add DE_PRIORITIZE exhaustion detection in candidate selection. Propagate DB errors in UpdateQuotaEnforcementSettings resolver. Add usage ratio to wafer quota limits. Extend synthetic checker test coverage for limits array. Fix comment: de_prioritize description.
Replace magic numbers (0.8, quota_exhausted) with named constants. Extract NewTokenLimitStatus() and IsReadyStatus() helpers to deduplicate quota data construction across 6 provider checkers. Tighten nextCheckIntervalForStatus() to use providerquotastatus.Status instead of string. Simplify quota selector wiring in selectCandidates().
Restore doc comments and inline field comments on QuotaChecker interface, QuotaData struct fields, and NanoGPT checker types/functions that existed on unstable but were dropped during the provider_quota package restructure.
Restore doc comments on buildNanoGPTQuotaURL and findEarliestResetAt, plus inline comments on warning state check, nextResetAt calculation, and grace period fallback that were lost during the provider_quota restructure.
Greptile SummaryThis PR adds a comprehensive provider quota enforcement system that monitors per-limit quota status (token vs. image) across 7 provider checkers and integrates it into the request-routing pipeline via two enforcement modes: Confidence Score: 5/5Safe to merge; all remaining findings are P2 style/polish issues with no impact on correctness or data integrity. The feature is well-architected and thoroughly tested (1500+ new test lines). All P0/P1 concerns raised in prior threads have been addressed. The two new findings are P2: a JSON tag/map-key inconsistency that creates no runtime bug under current usage, and an edge-case penalty floor in the warning-scoring path that is unlikely to matter in practice. No data loss, security, or routing correctness issues were found. internal/server/biz/provider_quota/types.go (JSON tag alignment), internal/server/orchestrator/lb_strategy_quota.go (zero-ratio warning penalty) Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Incoming Request] --> B[selectCandidates middleware]
B --> C[Base selector: model/stream/policy filters]
C --> D{ProviderQuotaSelector}
D --> E{Enforcement enabled?}
E -- No --> F[All candidates pass through]
E -- Yes --> G{Mode?}
G -- ExhaustedOnly --> H[Filter: EffectiveStatus == Exhausted for request limit type]
H --> I{candidates empty?}
I -- Yes + FilteredCount > 0 --> J[503 QuotaExhaustedError]
I -- No --> K[LoadBalancedSelector]
G -- DePrioritize --> L[All candidates pass through]
L --> K
K --> M[QuotaAwareStrategy scores each channel]
M --> N{effectiveStatus?}
N -- Available --> O[score: 0]
N -- Unknown --> P[score: 0]
N -- Warning + DePrioritize --> Q[score: -scaleScore x usageRatio]
N -- Exhausted --> R[score: -10000]
O & P & Q & R --> S[Combined sorted candidates]
S --> T{DePrioritize mode? All channels exhausted?}
T -- Yes --> J
T -- No --> U[Store candidates, proceed to routing]
Reviews (3): Last reviewed commit: "fix: use request context instead of Back..." | Re-trigger Greptile |
- Use context.Background() in Start() goroutine to prevent cancellation - Extract weeklyTokenLimitStatus() for self-documenting non-exhaustion logic - Replace dead Ready=exhausted check with Ready=true for weekly limits - Hoist QuotaEnforcementSettingsOrDefault() outside conditional block - Remove unnecessary quotaSelector nil guard - Fix struct field alignment in QuotaLimitStatus and QuotaData
…dges Rename EXHAUSTED_ONLY enforcement mode to Block Exhausted in UI and locale files for clarity. Add enforcement effect badges (Blocked/Deprioritized) to quota dialog rows when a channel's quota is exhausted. Changes: - QuotaRow now receives enforcementMode prop and renders Blocked or Deprioritized badges based on quota status and enforcement setting - Update EXHAUSTED_ONLY label in en/zh-CN locale files - Add quota.status.blocked and quota.status.deprioritized keys - Minor formatting and import cleanup in quota-badges.tsx
Summary
Add quota enforcement infrastructure that monitors provider quota status and integrates it into the request routing pipeline. Channels whose providers report exhausted quotas are filtered from candidate selection (exhausted-only mode) or deprioritized by the load balancer (de-prioritize mode). Per-limit-type status tracking enables granular enforcement — e.g., routing token requests to a channel whose image limit is exhausted.
Spirit/Intent
Prevent wasted requests and improve reliability by automatically detecting provider quota exhaustion and steering traffic away from exhausted channels before errors reach the user.
Key Changes
provider_quota.warning_check_interval_ratiocontrols how often warning-state channels are rechecked (default: 4× normal interval)Risks