feat(quota): add provider quota enforcement with per-limit status tracking by djdembeck · Pull Request #1518 · looplj/axonhub

djdembeck · 2026-04-28T00:52:31Z

Summary

Add quota enforcement infrastructure that monitors provider quota status and integrates it into the request routing pipeline. Channels whose providers report exhausted quotas are filtered from candidate selection (exhausted-only mode) or deprioritized by the load balancer (de-prioritize mode). Per-limit-type status tracking enables granular enforcement — e.g., routing token requests to a channel whose image limit is exhausted.

Spirit/Intent

Prevent wasted requests and improve reliability by automatically detecting provider quota exhaustion and steering traffic away from exhausted channels before errors reach the user.

Key Changes

Per-limit quota status: Each provider checker now reports per-limit-type entries (token, image) with individual status, usage ratio, and readiness — enabling modality-aware routing decisions
Enforcement modes: Exhausted-only mode filters exhausted channels from candidates; De-prioritize mode keeps them as candidates but penalizes them in the load balancer, only failing the request if every channel is exhausted
Quota-aware load balancing: Load balancer scores channels based on per-limit effective status, with penalty scaling proportional to usage ratio
Candidate filtering: Exhausted channels are removed from candidates in exhausted-only mode; in de-prioritize mode they remain candidates but receive heavy penalties
Request modality propagation: The request type (image vs. token) is carried through the routing pipeline so quota decisions use the correct per-limit status
API error responses: When all channels are quota-depleted for a model, the API returns HTTP 503 with a structured quota-exhausted error; this applies to chat completion, streaming, doubao, and playground endpoints
Frontend settings: Quota enforcement toggle and mode selector in System Settings, with GraphQL schema and resolvers
Config: provider_quota.warning_check_interval_ratio controls how often warning-state channels are rechecked (default: 4× normal interval)
Checker updates: All 7 provider checkers (ClaudeCode, Codex, GitHub Copilot, NanoGPT, NeuralWatt, Synthetic, Wafer) updated to emit per-limit statuses with correct usage ratios
Tests: 1500+ lines of new tests covering quota cache, channel status, candidate filtering, load balancer scoring, API error handling, and per-limit routing

Risks

Cache initialization loads all quota records synchronously on startup; may delay service start with large datasets

- Add QuotaEnforcementSettings to system service with enabled/mode config - Implement ProviderQuotaSelector to filter exhausted channels at selection time - Add QuotaAwareStrategy applying -10000 penalty for exhausted channels - Wire quota-aware components into all load balancers via fx - Add GraphQL query/mutation for quota enforcement settings control - Return HTTP 503 with quota_exhausted error code in API handlers - Add in-memory quota status cache using sync.Map for O(1) lookups - Add comprehensive tests for quota selector and scoring strategy

[R1] Only return QuotaExhaustedError when quota filtering caused empty candidates (ProviderQuotaSelector.FilteredCount > 0), preventing wrong HTTP 503 for non-quota failures like model-not-found. [D1] Extract writeQuotaExhaustedResponse helper, replacing 5 identical copy-pasted quota error blocks across chat.go, doubao.go, openai.go. [D2] Remove unreachable quota-exhausted checks after ReadHTTPRequest in doubao.go and openai.go (ReadHTTPRequest only returns io.ReadAll errors, never QuotaExhaustedError). [R5] Rewrite quota test to exercise actual production code (writeQuotaExhaustedResponse) instead of manually reimplementing handler logic that would pass even if handler broke. [S1] Define QuotaEnforcementMode typed constants in backend (ExhaustedOnly, DePrioritize), replacing 40+ raw string literals across system.go, orchestrator, resolvers, and tests. [S1] Type QuotaChannelStatus.Status as providerquotastatus.Status instead of bare string, reusing existing typed enum. [S2] Extract hardcoded 0.8 warning usage ratio into named constant (warningUsageRatio) in lb_strategy_quota.go. [S1] Define QuotaEnforcementMode union type in frontend (TypeScript), applied to QuotaEnforcementSettings and UpdateQuotaEnforcementSettingsInput. [S3] Remove dead 'exhausted_only' fallback and 'value &&' guard in quota-settings.tsx Select component.

Negate de-prioritize score so warning channels rank below exhausted ones. Add validation rejecting invalid quota enforcement modes. Fix playground error handling: return 503 for quota-exhausted errors instead of falling through to generic HTTP error handling. Update quota enforcement settings resolver to use the non-default variant so errors surface instead of silently returning zero values. Clean up stale .playwright-cli/ test artifacts and add to .gitignore.

Populate per-limit QuotaLimitStatus in all 7 checkers based on request modality (req.Image != nil for image vs token types). Add NextCheckAt = checkInterval/4 for warning channels. Propagate quota limit type via context keys. Score and filter providers per-limit using per-limit effective status evaluation.

QuotaEnforcementMode was missing MarshalGQL causing gqlgen runtime type assertion error when serializing the mode field. Converted mode from GraphQL String to proper enum type with MarshalGQL/UnmarshalGQL methods and updated frontend to use SCREAMING_SNAKE_CASE values.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

gemini-code-assist

Code Review

This pull request introduces a system-wide quota enforcement mechanism, allowing AxonHub to manage channels with exhausted provider quotas through either strict filtering or de-prioritization. Key additions include a new "Quota" settings UI, modality-specific (image vs. token) quota tracking across various provider checkers, and an in-memory status cache to optimize routing decisions. Feedback focuses on refining the DE_PRIORITIZE mode to ensure exhausted channels remain available as a last resort, correcting status ranking logic, optimizing service startup by loading the quota cache asynchronously, and maintaining modality-specific limits during error scenarios.

…istency - Fix inverted warning penalty formula in QuotaAwareStrategy (1-usageRatio → usageRatio) - Fix inverted UsageRatio in NeuralWatt checker (remaining → used fraction) - Fix usageRatio=0.0 when exhausted in GitHub Copilot checker (default to 1.0) - Fix PercentUsed≥1.0 mapping to warning instead of exhausted in NanoGPT checker - Fix UsageRatio=0 when Codex limit exhausted without UsedPercent - Fix saveQuotaError wiping per-limit cache on transient failures - Wire warning_check_interval_ratio config through DI instead of hardcoding /4 - Fix EffectiveStatus returning Available for all-Unknown limits - Return error from UnmarshalJSON on invalid QuotaEnforcementMode - Replace string literals with typed status constants in orchestrator - Remove dead IsNotFound check in QuotaEnforcementSettingsOrDefault - Add per-limit exhausted/edge-case tests for NanoGPT - Add multi-ratio coverage and penalty-increase test for LB strategy - Add cache round-trip test for Limits and EffectiveStatus unknown test

…t EffectiveStatus init - ProviderQuotaSelector: skip filtering in DePrioritize mode so QuotaAwareStrategy can penalize exhausted channels instead of removing them, preserving the soft enforcement semantic - EffectiveStatus: initialize worstStatus to StatusUnknown and worstReady to false so that Unknown-ranked limits can be overridden by any known status instead of returning Available incorrectly - Start: run loadQuotaCache async to avoid blocking startup on large quota datasets - Update tests to expect all candidates in DePrioritize mode

Reduce the blind spot between quota checks so that channels in warning state are detected sooner. At 20m, a Codex channel at 90% usage on the 3h primary window could fully exhaust before the next check. At 5m, the worst-case blind spot is under 30% of remaining quota even for the shortest provider windows. The adaptive warning interval (base/4) now checks every ~75s instead of 5m, catching rapid exhaustion quickly once warning is first detected. Also enable refetchIntervalInBackground on the frontend quota query so the UI stays current when the browser tab is backgrounded.

Add channel-level EffectiveStatus floor when base is exhausted - Fallback to "unknown" for non-standard status values for consistency - Lower synthetic warning boundary to prevent false 429 selector picks - Refactor quotaSelector for correct composition order - Add UnmarshalGQL for EffectiveStatus type consistency - Add round-trip test for quota selection logic - Add config documentation for quota warning interval ratio

Refactor API-specific error formatting for quota exhausted errors (extract wrapQuotaExhaustedAsResponseError shared helper). Fix provider_quota warning interval math (multiply vs divide ratio). Add DE_PRIORITIZE exhaustion detection in candidate selection. Propagate DB errors in UpdateQuotaEnforcementSettings resolver. Add usage ratio to wafer quota limits. Extend synthetic checker test coverage for limits array. Fix comment: de_prioritize description.

Replace magic numbers (0.8, quota_exhausted) with named constants. Extract NewTokenLimitStatus() and IsReadyStatus() helpers to deduplicate quota data construction across 6 provider checkers. Tighten nextCheckIntervalForStatus() to use providerquotastatus.Status instead of string. Simplify quota selector wiring in selectCandidates().

Restore doc comments and inline field comments on QuotaChecker interface, QuotaData struct fields, and NanoGPT checker types/functions that existed on unstable but were dropped during the provider_quota package restructure.

Restore doc comments on buildNanoGPTQuotaURL and findEarliestResetAt, plus inline comments on warning state check, nextResetAt calculation, and grace period fallback that were lost during the provider_quota restructure.

…ments

greptile-apps · 2026-04-28T06:14:09Z

Greptile Summary

This PR adds a comprehensive provider quota enforcement system that monitors per-limit quota status (token vs. image) across 7 provider checkers and integrates it into the request-routing pipeline via two enforcement modes: ExhaustedOnly (hard-filter exhausted channels) and DePrioritize (score penalty in the load balancer). The implementation is large but well-structured, with 1500+ lines of tests covering quota cache, candidate filtering, load balancer scoring, and API error responses.

Confidence Score: 5/5

Safe to merge; all remaining findings are P2 style/polish issues with no impact on correctness or data integrity.

The feature is well-architected and thoroughly tested (1500+ new test lines). All P0/P1 concerns raised in prior threads have been addressed. The two new findings are P2: a JSON tag/map-key inconsistency that creates no runtime bug under current usage, and an edge-case penalty floor in the warning-scoring path that is unlikely to matter in practice. No data loss, security, or routing correctness issues were found.

internal/server/biz/provider_quota/types.go (JSON tag alignment), internal/server/orchestrator/lb_strategy_quota.go (zero-ratio warning penalty)

Important Files Changed

Filename	Overview
internal/server/biz/provider_quota.go	Core quota cache service with `QuotaChannelStatus`, `EffectiveStatus`, in-memory sync.Map cache, and roundtrip merge/extract helpers for per-limit data; startup cache load uses the lifecycle context (pre-existing concern noted in threads)
internal/server/orchestrator/candidates_quota.go	New `ProviderQuotaSelector` wraps the candidate chain and filters exhausted channels in ExhaustedOnly mode; correctly passes through all candidates in DePrioritize mode and exposes `FilteredCount` for the outer error check
internal/server/orchestrator/lb_strategy_quota.go	New `QuotaAwareStrategy` scores channels using `EffectiveStatus` and `scaleScore`; exhausted gets -10000 penalty, warning gets negative penalty proportional to usage ratio; correctly reads limit type from context
internal/server/orchestrator/select_candidates.go	Wires `ProviderQuotaSelector` into the candidate chain and adds post-selection checks for both ExhaustedOnly (`FilteredCount`) and DePrioritize (`areAllChannelsExhausted`) modes to emit `QuotaExhaustedError`
internal/server/biz/provider_quota/types.go	Adds `QuotaLimitStatus`, `QuotaLimitType`, `RequestModality`, and helper constants; JSON struct tags are snake_case but the DB serialisation path uses camelCase map keys — minor inconsistency
internal/server/api/chat.go	Adds `wrapQuotaExhaustedAsResponseError` for non-streaming paths and handles `QuotaExhaustedError` in `FormatStreamError` for streaming paths; 503 response code is correct
internal/server/biz/system.go	Adds `QuotaEnforcementSettings` with proper GQL and JSON bidirectional serialisation for both uppercase (GQL) and lowercase (JSON) mode values; default is disabled with ExhaustedOnly mode
internal/server/orchestrator/orchestrator.go	Threads `quotaProvider` into `ChatCompletionOrchestrator` and adds `QuotaAwareStrategy` to all three load balancers (adaptive, failover, circuit-breaker)

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Incoming Request] --> B[selectCandidates middleware]
    B --> C[Base selector: model/stream/policy filters]
    C --> D{ProviderQuotaSelector}
    D --> E{Enforcement enabled?}
    E -- No --> F[All candidates pass through]
    E -- Yes --> G{Mode?}
    G -- ExhaustedOnly --> H[Filter: EffectiveStatus == Exhausted for request limit type]
    H --> I{candidates empty?}
    I -- Yes + FilteredCount > 0 --> J[503 QuotaExhaustedError]
    I -- No --> K[LoadBalancedSelector]
    G -- DePrioritize --> L[All candidates pass through]
    L --> K
    K --> M[QuotaAwareStrategy scores each channel]
    M --> N{effectiveStatus?}
    N -- Available --> O[score: 0]
    N -- Unknown --> P[score: 0]
    N -- Warning + DePrioritize --> Q[score: -scaleScore x usageRatio]
    N -- Exhausted --> R[score: -10000]
    O & P & Q & R --> S[Combined sorted candidates]
    S --> T{DePrioritize mode? All channels exhausted?}
    T -- Yes --> J
    T -- No --> U[Store candidates, proceed to routing]

_{Reviews (3): Last reviewed commit: "fix: use request context instead of Back..." | Re-trigger Greptile}

- Use context.Background() in Start() goroutine to prevent cancellation - Extract weeklyTokenLimitStatus() for self-documenting non-exhaustion logic - Replace dead Ready=exhausted check with Ready=true for weekly limits - Hoist QuotaEnforcementSettingsOrDefault() outside conditional block - Remove unnecessary quotaSelector nil guard - Fix struct field alignment in QuotaLimitStatus and QuotaData

…dges Rename EXHAUSTED_ONLY enforcement mode to Block Exhausted in UI and locale files for clarity. Add enforcement effect badges (Blocked/Deprioritized) to quota dialog rows when a channel's quota is exhausted. Changes: - QuotaRow now receives enforcementMode prop and renders Blocked or Deprioritized badges based on quota status and enforcement setting - Update EXHAUSTED_ONLY label in en/zh-CN locale files - Add quota.status.blocked and quota.status.deprioritized keys - Minor formatting and import cleanup in quota-badges.tsx

djdembeck added 5 commits April 27, 2026 15:48

greptile-apps Bot reviewed Apr 28, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread internal/server/orchestrator/candidates_quota.go Outdated

Comment thread internal/server/biz/provider_quota.go Outdated

Comment thread internal/server/biz/provider_quota.go Outdated

Comment thread internal/server/biz/provider_quota.go Outdated

djdembeck marked this pull request as draft April 28, 2026 00:58

djdembeck changed the title ~~fix: resolve 14 correctness bugs in quota enforcement feature~~ feat(quota): add provider quota enforcement with per-limit status tracking Apr 28, 2026

djdembeck added 10 commits April 27, 2026 21:52

fix: use request-scoped context in goroutine (G118)

663c0ef

fix: restore comments lost during branch restructure

cd40256

Restore doc comments and inline field comments on QuotaChecker interface, QuotaData struct fields, and NanoGPT checker types/functions that existed on unstable but were dropped during the provider_quota package restructure.

fix: restore remaining doc comments in nanogpt_checker.go

9db67e3

Restore doc comments on buildNanoGPTQuotaURL and findEarliestResetAt, plus inline comments on warning state check, nextResetAt calculation, and grace period fallback that were lost during the provider_quota restructure.

fix: restore 'Build raw data map' and 'Convert millisecond epoch' com…

887cd9e

…ments

djdembeck marked this pull request as ready for review April 28, 2026 06:05

greptile-apps Bot reviewed Apr 28, 2026

View reviewed changes

djdembeck added 3 commits April 28, 2026 02:33

fix: use request context instead of Background in quota goroutine

7c0640a

looplj merged commit 1687c56 into looplj:unstable Apr 28, 2026
4 checks passed

djdembeck deleted the feature/quota-enforcement branch April 28, 2026 17:50

looplj mentioned this pull request May 1, 2026

[Feature/功能]: Support channels may control scheduling based on either the total quota or the periodic quota. #895

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(quota): add provider quota enforcement with per-limit status tracking#1518

feat(quota): add provider quota enforcement with per-limit status tracking#1518
looplj merged 18 commits into
looplj:unstablefrom
djdembeck:feature/quota-enforcement

djdembeck commented Apr 28, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

djdembeck commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Spirit/Intent

Key Changes

Risks

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

djdembeck commented Apr 28, 2026 •

edited

Loading

greptile-apps Bot commented Apr 28, 2026 •

edited

Loading