feat: channel concurrency settings, close #1130#1322
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a concurrency limit feature for channels, adding a maxConcurrent field to the rate limit settings across the frontend UI, GraphQL API, and backend scoring logic. The RateLimitAwareStrategy has been updated to utilize a connectionTracker for enforcing these limits. Feedback highlights an inconsistency where a concurrency limit of zero is treated as unlimited rather than a strict block, and notes a potential integer overflow risk in the GraphQL schema where the Int type is used for values that are int64 in the backend.
| } | ||
|
|
||
| // Check concurrent requests | ||
| if rl.MaxConcurrent != nil && *rl.MaxConcurrent > 0 && s.connectionTracker != nil { |
There was a problem hiding this comment.
The check *rl.MaxConcurrent > 0 implies that a concurrency limit of 0 is treated as unlimited (since it skips the limit check). However, the documentation in internal/objects/channel.go states that nil = unlimited. Usually, a limit of 0 should mean 'no requests allowed'. While the frontend validation ensures positive values, the backend should ideally handle 0 as a strict block to be consistent with the nil vs non-nil logic.
| } | ||
|
|
||
| // Check concurrent requests | ||
| if rl.MaxConcurrent != nil && *rl.MaxConcurrent > 0 && s.connectionTracker != nil { |
| input ChannelRateLimitInput { | ||
| rpm: Int | ||
| tpm: Int | ||
| maxConcurrent: Int |
There was a problem hiding this comment.
The maxConcurrent field is defined as Int in GraphQL, which is a signed 32-bit integer. In the Go backend, it is defined as int64. While 2 billion concurrent requests is likely sufficient, using Int for rate limits (like TPM) can sometimes lead to overflows if the values are very large. Consider if this consistency with rpm and tpm is desired or if a larger type should be used.
* feat: per-channel concurrency queue + admission control, close #1130 PR #1322 only added scoring-layer down-ranking, which was a no-op when a model only had one channel. This change adds true admission enforcement. Backend: - ChannelLimiter: soft mode (count only) and hard mode (FIFO queue + capacity ceiling + per-channel timeout), with slot-handoff Release for FIFO fairness. - ChannelLimiterManager: per-channel limiter cache keyed by config struct equality; exposes Snapshot for observability. - ChannelQueueError: synthetic 429 + structured body (channel_queue_full / channel_queue_timeout). No Retry-After to avoid the cooldown middleware misclassifying local rejections as upstream 429s. - New ChannelRateLimit fields: QueueSize, QueueTimeoutMs. - Middleware order: channel admission runs before rate-limit tracking so locally rejected requests do not consume RPM budget. - LB scoring composes RPM/TPM and concurrency sub-scores via min(); hard- mode queueing capped at 30% so idle peers keep a clear advantage. - Removed legacy ConnectionTracker; ChannelLimiterManager is the single source of truth for in-flight stats. - OpenTelemetry: 5 channel-level instruments (inflight / queue_waiting gauges, queue_full / queue_timeout counters, queue_wait_seconds histogram). - GraphQL Channel.liveLimiterStats field for real-time UI display. - ValidateRateLimit guards against negatives and queueSize without maxConcurrent. Frontend: - channels-rate-limit-dialog: queueSize + queueTimeoutMs inputs with cross-field validation. - channel-limiter-cell: text-only inFlight/cap and Q waiting/queueSize display, color-coded by utilization, polled every 5s. - i18n strings for new fields and tooltips (en + zh-CN). * feat(channels): warn when MaxConcurrent is set without a queue When a user configures only MaxConcurrent and leaves Queue Size empty, the limiter runs in soft mode — the cap only down-ranks the channel in load-balancer scoring, it does NOT block excess upstream requests. This is easy to miss; surface an inline amber advisory under the Queue Size field so users know to set Queue Size for hard enforcement. * fix(channels): drop stale limiter entry, defensive scoring, monotonic hard-mode score Address PR #1503 review findings from copilot/gemini-code-assist: - ChannelLimiterManager.GetOrCreate now drops the cached entry when capacity transitions to 0, so Stats/Snapshot stop reporting a channel that no longer has a concurrency limit and downstream code can no longer dereference now-nil rate-limit pointers. - RateLimitAwareStrategy.scoreConcurrency adds a defensive nil check on channel.Settings.RateLimit / MaxConcurrent so a brief stale Stats hit before the next GetOrCreate cannot panic. - Hard-mode below-capacity score is shifted to [queueCeiling, maxScore] so a 9/10 channel always outranks a 10/10 channel with an empty queue (previously 10 vs 30 — load balancer preferred the saturated channel). - zh-CN softModeWarning drops literal **markers** that the dialog renders as plain text. - channel_queue_error.go docstring refreshed (connectionTracking middleware was renamed to channelLimiter). * fix(channels): per-attempt limiter slot to plug retry-induced leak The channel concurrency limiter middleware held a struct-scoped sync.Once for release. Pipeline.Process re-enters OnOutboundRawRequest on same-channel retry and channel switch, calling Acquire on the same middleware instance again -- but the struct-scoped Once short-circuits every release after the first, permanently leaking the slot. Under sustained upstream 429s with retry enabled, MaxConcurrent slots leak one-by-one until the limiter is fully exhausted and every new request times out in the queue. Replace acquired+releaseOnce with atomic.Pointer[limiterSlot] where each Acquire mints a fresh slot owning its own Once. Stream wrappers capture the slot pointer at wrap time so they release the slot they were minted with, not whatever's current at Close time. * test(channels): mark rate-limit subtests parallel to satisfy tparallel Parent tests already call t.Parallel(); subtests must too or the linter flags the asymmetry.
Uh oh!
There was an error while loading. Please reload this page.