feat(serve): add per-tier HTTP rate limiting for daemon (issue #4514 T3.4)#4861
Conversation
…T3.4) Token-bucket rate limiter with continuous drip refill, opt-in via --rate-limit flag. Three tiers: prompt (10/min), mutation (30/min), read (120/min). Health, heartbeat, SSE, and /acp endpoints are exempt. - rateLimit.ts: core middleware with fail-open, bucket cap (10k), GC sweep (timer + request-count), sampled logging, graceful shutdown - types.ts: 5 new ServeOptions fields - capabilities.ts: rate_limit conditional feature tag - server.ts: middleware wiring between bearerAuth and express.json, deep health hit counts, app.locals lifecycle exposure - runQwenServe.ts: shutdown dispose + setDraining - serve.ts: CLI flags, env var fallbacks, boot validation - server.test.ts: capability fixture update for rate_limit - rateLimit.test.ts: 25 unit tests covering bucket mechanics, tier resolution, key extraction, fail-open, draining, reset, callbacks 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
Hi @doudouOUC — thanks for the PR! The rate-limiting feature itself looks interesting, but the PR body doesn't follow our PR template. A few required sections are missing or renamed:
- "What this PR does" → currently "Summary"
- "Why it's needed" → absent. What's the motivation? Who asked for this? The title references issue #4514 T3.4 — please link it and explain the user-facing problem.
- "Reviewer Test Plan" → absent. The checkbox list under "Test plan" isn't a reviewer test plan — we need concrete steps a reviewer can follow to confirm the change works (e.g.,
qwen serve --rate-limit --rate-limit-prompt 2, then send 3 requests and confirm the 3rd gets 429). - "Risk & Scope" → absent. The "Known Limitations" section touches on this but doesn't address breaking changes or out-of-scope items.
- "Linked Issues" → absent. The title says "issue #4514 T3.4" but there's no
Closes #4514or similar link.
Could you update the PR body to match the template? Happy to re-review once that's done.
中文说明
感谢提交 PR!限流功能本身看起来不错,但 PR 描述没有按照我们的 PR 模板 填写。部分必填章节缺失或改名了:
- "What this PR does" → 当前写的是 "Summary"
- "Why it's needed" → 缺失。动机是什么?标题引用了 issue #4514 T3.4,请链接并说明用户侧的问题。
- "Reviewer Test Plan" → 缺失。"Test plan" 下的复选框不是 reviewer 测试计划——需要给出 reviewer 可以跟随操作的具体步骤(例如
qwen serve --rate-limit --rate-limit-prompt 2,发送 3 个请求确认第 3 个返回 429)。 - "Risk & Scope" → 缺失。"Known Limitations" 部分涉及了相关内容,但没有涵盖 breaking changes 或超出范围的事项。
- "Linked Issues" → 缺失。标题提到 "issue #4514 T3.4" 但没有
Closes #4514等链接。
请按模板更新 PR 描述,更新后我会重新审查。
— Qwen Code · qwen3.7-max
There was a problem hiding this comment.
Pull request overview
Adds an opt-in, per-tier HTTP rate limiter to the qwen serve daemon, intended to protect high-cost endpoints (especially prompts) while keeping compatibility by defaulting to “off”.
Changes:
- Introduces a new
rateLimitmiddleware (token bucket w/ continuous refill), plus unit tests. - Wires the limiter into the HTTP server stack (after
bearerAuth, beforeexpress.json) and exposes hit counters on deep health checks. - Adds CLI flags/env fallbacks and advertises a conditional
rate_limitcapability tag when enabled.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/cli/src/serve/types.ts | Extends ServeOptions with rate limit configuration fields. |
| packages/cli/src/serve/server.ts | Installs rate limiting middleware, surfaces hit counts on deep /health, and advertises the feature when enabled. |
| packages/cli/src/serve/server.test.ts | Updates expected capability feature lists and adds predicate coverage for the new conditional tag. |
| packages/cli/src/serve/runQwenServe.ts | Disposes/drains the rate limiter during shutdown. |
| packages/cli/src/serve/rateLimit.ts | New rate limiter implementation (tiers, keying, GC, sampled logging, fail-open behavior). |
| packages/cli/src/serve/rateLimit.test.ts | New unit test suite for limiter mechanics and edge cases. |
| packages/cli/src/serve/capabilities.ts | Registers rate_limit and adds a conditional predicate toggle. |
| packages/cli/src/commands/serve.ts | Adds CLI flags + env var fallbacks and validates rate-limit inputs at boot. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
wenshao
left a comment
There was a problem hiding this comment.
Additional findings (Suggestion):
server.ts:3019+runQwenServe.ts:1136— Rate limiter stored via raw string keyapp.locals['_rateLimiter']with manual type cast. Codebase has typed-accessor pattern (setDeviceFlowRegistry/getDeviceFlowRegistry). A typo in either string silently breaks shutdown disposal, leaking the GCsetInterval. ConsidersetRateLimiter(app, limiter)/getRateLimiter(app)accessors. (Cannot post as inline — line not in diff.)
Needs Human Review (low confidence):
- No integration test creates
createServeAppwithrateLimit: trueto verify middleware wiring through the real Express stack - Test gaps for config validation throws (lines 170-181), fail-open catch behavior (line 308), and sampled logger suppression counting
MAX_CLIENT_ID_LENGTH,CLIENT_ID_RE, and header name'x-qwen-client-id'duplicated fromserver.tswith only a "keep in sync" comment — no compile-time safety- No startup log when rate limiter is enabled — operators can't confirm active config without tracing from CLI args to env vars
getHitCounts()returns cumulative counters that never reset — health data becomes less useful over time- 429 response headers don't include
X-RateLimit-Tier— clients and monitoring can't identify which tier was exceeded from headers alone
— qwen3.7-max via Qwen Code /review
- Add onError callback for fail-open observability (catch block + bucket overflow) - Fix env var priority: CLI --no-rate-limit now overrides QWEN_SERVE_RATE_LIMIT - Remove sampledLog.clear() from sweep to preserve suppressed counts - Add sampledLog.clear() to dispose() for shutdown cleanup - Add typed accessors setRateLimiter/getRateLimiter (replace raw string key) - Wire onError callback through server.ts daemonLog 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Review round 1 — fix summary (8cc8d8e)
Also addressed wenshao's top-level suggestion: replaced raw |
Remove default:false from yargs so argv['rate-limit'] is undefined when neither flag is passed. Use ?? for env var fallback so --no-rate-limit (explicit false) wins over QWEN_SERVE_RATE_LIMIT=1. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
🔍 Local verification report — real build, unit tests & live runtime
Verdict: ✅ The PR is clean. Its own tests pass (rateLimit 25/25 + the Environment
✅ The PR's own tests — green
✅ Live runtime test — real
|
| Run | Result |
|---|---|
src/serve on PR (merge result) |
16 failed / 931 passed |
The same 5 failing files on base (daemon_mode_b_main, no PR) |
16 failed / 471 passed — same files, same assertions |
None are rate-limit related:
- 3× capability-registry tests — the
EXPECTED_REGISTERED_FEATURESfixture is out of sync, missing 3 features (workspace_agent_generate,session_stats,workspace_mcp_manage) that this PR does not touch. (The PR did correctly addrate_limitto both the registry and the fixture.) - 8×
runQwenServeboot tests +workspaceAgents/workspaceMemory/acpHttp/transport/envSnapshot— server-boot / env integration tests failing in this sandbox withAcpSessionBridge initialize timed out after 10000ms.
→ Base-branch concerns, not introduced by #4861. Worth flagging separately: the capability-fixture drift is deterministic (3 features need adding to the fixture) regardless of environment.
Source & security assessment
A clean token-bucket limiter with solid defensive properties:
- Placement — after
bearerAuth(only authenticated requests count) and beforeexpress.json(reject before body parse). Correct. - Fail-open everywhere — draining → pass; bucket map at 10k cap → pass; any internal error →
onError+next(). A limiter fault can't take the daemon down. - Lifecycle — GC timer is
unref()'d (won't block process exit) and cleared ondispose(); shutdown path callssetDraining(true)+dispose(). - Keying — loopback → client-id (else
anonymous); non-loopback → IP (+client-id) with IPv6 normalization and a^[A-Za-z0-9._:-]{1,128}$client-id guard. - 429 carries
Retry-After+X-RateLimit-*; off by default (backward compatible). The known limitations (reverse-proxy shared IP,0.0.0.0collapse, client-id rotation, boot-time-only) are accurately documented in the PR.
Build / typecheck
npm install + npm run build → exit 0. The PR source also compiles and runs under tsx (the live test) and under vitest/esbuild (25/25), so its TypeScript is sound.
Scope: build + unit + live HTTP runtime on Linux against the real merge result. Not exercised: reverse-proxy / non-loopback multi-client keying (documented limitations), and the pre-existing base-branch failures noted above.
chiga0
left a comment
There was a problem hiding this comment.
Code Review Overview (AI Generated)
PR: #4861 — feat(serve): add per-tier HTTP rate limiting for daemon
Type: New Feature (production code — security-relevant middleware)
Change size: +1051/-1 across 8 files
HEAD: ef3dac490e
Findings Summary
- Critical/Major: 0 items
- Minor: 0 items
- Nit: 0 items
Assessment
Rate limiter design: Token bucket with continuous drip refill is the correct choice for HTTP rate limiting — avoids bursty behavior of fixed-window counters while keeping O(1) per-request cost. Three-tier system (prompt 10/min, mutation 30/min, read 120/min) maps well to the relative cost of each operation class. Fail-open on bucket overflow (10k cap) and internal errors is the right availability-first decision for a daemon.
Middleware wiring: Placed after bearerAuth (only count authenticated requests) and before express.json() (reject early without burning JSON parse CPU) — optimal ordering. Exempt routes (health, heartbeat, SSE, /acp, OPTIONS, demo) are correctly identified.
Graceful shutdown: setDraining(true) + dispose() in the shutdown path via typed getRateLimiter accessor. GC timer is properly unref()'d to avoid blocking process exit. sampledLog.clear() moved to dispose() (not sweep) — commit 2 fix is correct.
wenshao CR audit (6 items, all addressed at HEAD):
- TS2367 typecheck:
rate_limitadded toEXPECTED_REGISTERED_FEATURES+ dedicated test branch in fixture ✓ - Silent catch-all:
onErrorcallback added, wired throughdaemonLog✓ - Bucket overflow observability:
onErrorcalled with overflow error message ✓ - Sweep clearing sampledLog:
sampledLog?.clear()removed fromsweep(), added todispose()+reset()✓ - CLI flag vs env var priority: removed
default: falsefrom yargs,??correctly handles--no-rate-limit→falseoverriding env var ✓ - Loopback anonymous key collision: documented as known limitation ✓
CLI validation: --rate-limit-window-ms minimum 1000ms prevents degenerate configurations. All numeric flags validated as positive integers before passing to serve options. Env var fallbacks follow existing pattern (?? chain).
Test coverage: 25 unit tests covering bucket mechanics, tier resolution, key extraction (including IPv6 normalization), fail-open, draining, reset, and callbacks. Server test fixture properly updated for conditional rate_limit capability tag. No integration test for middleware wiring in server.test.ts — acceptable since the limiter is opt-in and the unit tests are thorough.
Key Observations
Clean, well-structured feature addition. The defense-in-depth approach (CLI validation + createRateLimiter validation) is appropriate. The documented limitations (reverse proxy IP, 0.0.0.0 bind, client-id rotation on loopback) are realistic for a boot-time-only local daemon and don't require code changes. Approved.
This review was generated by QoderWork AI
Already have 2 approved, 3ks.
| // Fail-open if bucket map is at capacity | ||
| let tierMap = buckets.get(key); | ||
| if (!tierMap) { | ||
| if (buckets.size >= MAX_BUCKETS) { |
There was a problem hiding this comment.
[Critical] Client-id rotation + global fail-open defeats rate limiting entirely
The key extractor (createKeyExtractor) uses the self-declared x-qwen-client-id header as part of the bucket key. An attacker can rotate through >10,000 unique client-id values (the regex [A-Za-z0-9._:-]+ with 128-char max provides a near-infinite keyspace) to fill the bucket map past MAX_BUCKETS. Once buckets.size >= 10000, this fail-open branch calls next() unconditionally — disabling rate limiting for all clients globally, not just the attacker's.
The GC sweep (staleness threshold = windowMs * 2 = 120s at default) cannot keep pace; an attacker sending ~83 req/s sustains 10,001 fresh buckets. On non-loopback binds behind a reverse proxy (where all traffic shares one remoteAddress), a single remote client can trigger this.
The PR documents client-id bypass as a known limitation, but the global impact (disabling protection for every client, not just the attacker) goes beyond per-client bypass.
Suggested fixes (pick one):
- Add a per-IP aggregate bucket that caps total requests from one IP regardless of client-id
- Key by IP only on non-loopback (ignore client-id for bucketing)
- Fail-closed for unknown keys on overflow (reject new keys while preserving existing buckets)
— qwen3.7-max via Qwen Code /review
Summary
qwen servevia--rate-limitflag/acpendpoints exemptsetDraining(true)+dispose()in shutdown pathDetails
rateLimit.tsrateLimit.test.tstypes.tsServeOptionsfieldscapabilities.tsrate_limitconditional feature tag (3 coordinated changes)server.tsbearerAuthandexpress.json, capabilities toggle, deep health hit counts,app.localslifecycle exposurerunQwenServe.tssetDraining+disposeserve.ts--rate-limit,--rate-limit-prompt/mutation/read/window-ms), env var fallbacks, boot validationserver.test.tsrate_limitconditional tagCLI Flags
--rate-limitQWEN_SERVE_RATE_LIMIT--rate-limit-promptQWEN_SERVE_RATE_LIMIT_PROMPT--rate-limit-mutationQWEN_SERVE_RATE_LIMIT_MUTATION--rate-limit-readQWEN_SERVE_RATE_LIMIT_READ--rate-limit-window-msQWEN_SERVE_RATE_LIMIT_WINDOW_MSKnown Limitations (documented)
req.socket.remoteAddressis the proxy IP; all clients share one bucket0.0.0.0bind: local clients all appear as127.0.0.1Test plan
rate_limitconditional tagqwen serve --rate-limit --rate-limit-prompt 2→ 3rd prompt returns 429🤖 Generated with Qwen Code