Skip to content

feat: channel concurrency settings, close #1130#1322

Merged
looplj merged 1 commit into
release/v0.9.xfrom
dev-tmp
Apr 8, 2026
Merged

feat: channel concurrency settings, close #1130#1322
looplj merged 1 commit into
release/v0.9.xfrom
dev-tmp

Conversation

@looplj

@looplj looplj commented Apr 8, 2026

Copy link
Copy Markdown
Owner

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

@looplj looplj merged commit 4a48f26 into release/v0.9.x Apr 8, 2026
5 checks passed

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a concurrency limit feature for channels, adding a maxConcurrent field to the rate limit settings across the frontend UI, GraphQL API, and backend scoring logic. The RateLimitAwareStrategy has been updated to utilize a connectionTracker for enforcing these limits. Feedback highlights an inconsistency where a concurrency limit of zero is treated as unlimited rather than a strict block, and notes a potential integer overflow risk in the GraphQL schema where the Int type is used for values that are int64 in the backend.

}

// Check concurrent requests
if rl.MaxConcurrent != nil && *rl.MaxConcurrent > 0 && s.connectionTracker != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The check *rl.MaxConcurrent > 0 implies that a concurrency limit of 0 is treated as unlimited (since it skips the limit check). However, the documentation in internal/objects/channel.go states that nil = unlimited. Usually, a limit of 0 should mean 'no requests allowed'. While the frontend validation ensures positive values, the backend should ideally handle 0 as a strict block to be consistent with the nil vs non-nil logic.

}

// Check concurrent requests
if rl.MaxConcurrent != nil && *rl.MaxConcurrent > 0 && s.connectionTracker != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the Score function, a concurrency limit of 0 will be ignored here due to the *rl.MaxConcurrent > 0 check. If nil is the designated value for 'unlimited', then 0 should likely be treated as 'exhausted'.

input ChannelRateLimitInput {
rpm: Int
tpm: Int
maxConcurrent: Int

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The maxConcurrent field is defined as Int in GraphQL, which is a signed 32-bit integer. In the Go backend, it is defined as int64. While 2 billion concurrent requests is likely sufficient, using Int for rate limits (like TPM) can sometimes lead to overflows if the values are very large. Consider if this consistency with rpm and tpm is desired or if a larger type should be used.

looplj pushed a commit that referenced this pull request Apr 27, 2026
* feat: per-channel concurrency queue + admission control, close #1130

PR #1322 only added scoring-layer down-ranking, which was a no-op when a
model only had one channel. This change adds true admission enforcement.

Backend:
- ChannelLimiter: soft mode (count only) and hard mode (FIFO queue +
  capacity ceiling + per-channel timeout), with slot-handoff Release for
  FIFO fairness.
- ChannelLimiterManager: per-channel limiter cache keyed by config struct
  equality; exposes Snapshot for observability.
- ChannelQueueError: synthetic 429 + structured body
  (channel_queue_full / channel_queue_timeout). No Retry-After to avoid the
  cooldown middleware misclassifying local rejections as upstream 429s.
- New ChannelRateLimit fields: QueueSize, QueueTimeoutMs.
- Middleware order: channel admission runs before rate-limit tracking so
  locally rejected requests do not consume RPM budget.
- LB scoring composes RPM/TPM and concurrency sub-scores via min(); hard-
  mode queueing capped at 30% so idle peers keep a clear advantage.
- Removed legacy ConnectionTracker; ChannelLimiterManager is the single
  source of truth for in-flight stats.
- OpenTelemetry: 5 channel-level instruments (inflight / queue_waiting
  gauges, queue_full / queue_timeout counters, queue_wait_seconds
  histogram).
- GraphQL Channel.liveLimiterStats field for real-time UI display.
- ValidateRateLimit guards against negatives and queueSize without
  maxConcurrent.

Frontend:
- channels-rate-limit-dialog: queueSize + queueTimeoutMs inputs with
  cross-field validation.
- channel-limiter-cell: text-only inFlight/cap and Q waiting/queueSize
  display, color-coded by utilization, polled every 5s.
- i18n strings for new fields and tooltips (en + zh-CN).

* feat(channels): warn when MaxConcurrent is set without a queue

When a user configures only MaxConcurrent and leaves Queue Size empty,
the limiter runs in soft mode — the cap only down-ranks the channel in
load-balancer scoring, it does NOT block excess upstream requests. This
is easy to miss; surface an inline amber advisory under the Queue Size
field so users know to set Queue Size for hard enforcement.

* fix(channels): drop stale limiter entry, defensive scoring, monotonic hard-mode score

Address PR #1503 review findings from copilot/gemini-code-assist:

- ChannelLimiterManager.GetOrCreate now drops the cached entry when
  capacity transitions to 0, so Stats/Snapshot stop reporting a channel
  that no longer has a concurrency limit and downstream code can no
  longer dereference now-nil rate-limit pointers.
- RateLimitAwareStrategy.scoreConcurrency adds a defensive nil check on
  channel.Settings.RateLimit / MaxConcurrent so a brief stale Stats hit
  before the next GetOrCreate cannot panic.
- Hard-mode below-capacity score is shifted to [queueCeiling, maxScore]
  so a 9/10 channel always outranks a 10/10 channel with an empty queue
  (previously 10 vs 30 — load balancer preferred the saturated channel).
- zh-CN softModeWarning drops literal **markers** that the dialog renders
  as plain text.
- channel_queue_error.go docstring refreshed (connectionTracking middleware
  was renamed to channelLimiter).

* fix(channels): per-attempt limiter slot to plug retry-induced leak

The channel concurrency limiter middleware held a struct-scoped sync.Once for
release. Pipeline.Process re-enters OnOutboundRawRequest on same-channel retry
and channel switch, calling Acquire on the same middleware instance again --
but the struct-scoped Once short-circuits every release after the first,
permanently leaking the slot. Under sustained upstream 429s with retry
enabled, MaxConcurrent slots leak one-by-one until the limiter is fully
exhausted and every new request times out in the queue.

Replace acquired+releaseOnce with atomic.Pointer[limiterSlot] where each
Acquire mints a fresh slot owning its own Once. Stream wrappers capture the
slot pointer at wrap time so they release the slot they were minted with,
not whatever's current at Close time.

* test(channels): mark rate-limit subtests parallel to satisfy tparallel

Parent tests already call t.Parallel(); subtests must too or the linter
flags the asymmetry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant