feat(serve): add per-tier HTTP rate limiting for daemon (issue #4514 T3.4) by doudouOUC · Pull Request #4861 · QwenLM/qwen-code

doudouOUC · 2026-06-08T13:04:48Z

Summary

Adds opt-in per-tier HTTP rate limiting to qwen serve via --rate-limit flag
Token bucket with continuous drip refill, default off to preserve backward compatibility
Three tiers: prompt (10/min), mutation (30/min), read (120/min) per client key
Health, heartbeat, SSE, and /acp endpoints exempt
Fail-open on bucket map capacity (10k keys) and internal errors
Graceful shutdown: setDraining(true) + dispose() in shutdown path

Details

File	Change
`rateLimit.ts`	NEW — core middleware: token bucket, key extractor (loopback=client-id, non-loopback=IP+client-id), tier resolver, sampled logging, GC (timer + request-count), config validation
`rateLimit.test.ts`	NEW — 25 unit tests: bucket mechanics, continuous drip refill, tier assignment, key extraction (IPv6 normalization), exempt routes, fail-open, draining, reset, callbacks
`types.ts`	5 new `ServeOptions` fields
`capabilities.ts`	`rate_limit` conditional feature tag (3 coordinated changes)
`server.ts`	Middleware wiring between `bearerAuth` and `express.json`, capabilities toggle, deep health hit counts, `app.locals` lifecycle exposure
`runQwenServe.ts`	Shutdown `setDraining` + `dispose`
`serve.ts`	CLI flags (`--rate-limit`, `--rate-limit-prompt/mutation/read/window-ms`), env var fallbacks, boot validation
`server.test.ts`	Capability fixture update for `rate_limit` conditional tag

CLI Flags

Flag	Default	Env Var
`--rate-limit`	false	`QWEN_SERVE_RATE_LIMIT`
`--rate-limit-prompt`	10	`QWEN_SERVE_RATE_LIMIT_PROMPT`
`--rate-limit-mutation`	30	`QWEN_SERVE_RATE_LIMIT_MUTATION`
`--rate-limit-read`	120	`QWEN_SERVE_RATE_LIMIT_READ`
`--rate-limit-window-ms`	60000	`QWEN_SERVE_RATE_LIMIT_WINDOW_MS`

Known Limitations (documented)

Reverse proxy: req.socket.remoteAddress is the proxy IP; all clients share one bucket
0.0.0.0 bind: local clients all appear as 127.0.0.1
Client-ID bypass: loopback clients can rotate client-id for fresh buckets (bounded by 10k cap → fail-open)
Boot-time only: limits require daemon restart to change

Test plan

25 unit tests pass (bucket, tier, key, exempt, fail-open, draining, reset, callbacks)
TypeScript typecheck passes (no rate-limit related errors)
ESLint + prettier pass (pre-commit hook)
Capability test fixture updated for rate_limit conditional tag
Manual: qwen serve --rate-limit --rate-limit-prompt 2 → 3rd prompt returns 429

🤖 Generated with Qwen Code

…T3.4) Token-bucket rate limiter with continuous drip refill, opt-in via --rate-limit flag. Three tiers: prompt (10/min), mutation (30/min), read (120/min). Health, heartbeat, SSE, and /acp endpoints are exempt. - rateLimit.ts: core middleware with fail-open, bucket cap (10k), GC sweep (timer + request-count), sampled logging, graceful shutdown - types.ts: 5 new ServeOptions fields - capabilities.ts: rate_limit conditional feature tag - server.ts: middleware wiring between bearerAuth and express.json, deep health hit counts, app.locals lifecycle exposure - runQwenServe.ts: shutdown dispose + setDraining - serve.ts: CLI flags, env var fallbacks, boot validation - server.test.ts: capability fixture update for rate_limit - rateLimit.test.ts: 25 unit tests covering bucket mechanics, tier resolution, key extraction, fail-open, draining, reset, callbacks 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

qwen-code-ci-bot

Hi @doudouOUC — thanks for the PR! The rate-limiting feature itself looks interesting, but the PR body doesn't follow our PR template. A few required sections are missing or renamed:

"What this PR does" → currently "Summary"
"Why it's needed" → absent. What's the motivation? Who asked for this? The title references issue #4514 T3.4 — please link it and explain the user-facing problem.
"Reviewer Test Plan" → absent. The checkbox list under "Test plan" isn't a reviewer test plan — we need concrete steps a reviewer can follow to confirm the change works (e.g., qwen serve --rate-limit --rate-limit-prompt 2, then send 3 requests and confirm the 3rd gets 429).
"Risk & Scope" → absent. The "Known Limitations" section touches on this but doesn't address breaking changes or out-of-scope items.
"Linked Issues" → absent. The title says "issue #4514 T3.4" but there's no Closes #4514 or similar link.

Could you update the PR body to match the template? Happy to re-review once that's done.

中文说明

感谢提交 PR！限流功能本身看起来不错，但 PR 描述没有按照我们的 PR 模板填写。部分必填章节缺失或改名了：

"What this PR does" → 当前写的是 "Summary"
"Why it's needed" → 缺失。动机是什么？标题引用了 issue #4514 T3.4，请链接并说明用户侧的问题。
"Reviewer Test Plan" → 缺失。"Test plan" 下的复选框不是 reviewer 测试计划——需要给出 reviewer 可以跟随操作的具体步骤（例如 qwen serve --rate-limit --rate-limit-prompt 2，发送 3 个请求确认第 3 个返回 429）。
"Risk & Scope" → 缺失。"Known Limitations" 部分涉及了相关内容，但没有涵盖 breaking changes 或超出范围的事项。
"Linked Issues" → 缺失。标题提到 "issue #4514 T3.4" 但没有 Closes #4514 等链接。

请按模板更新 PR 描述，更新后我会重新审查。

— Qwen Code · qwen3.7-max

Copilot

Pull request overview

Adds an opt-in, per-tier HTTP rate limiter to the qwen serve daemon, intended to protect high-cost endpoints (especially prompts) while keeping compatibility by defaulting to “off”.

Changes:

Introduces a new rateLimit middleware (token bucket w/ continuous refill), plus unit tests.
Wires the limiter into the HTTP server stack (after bearerAuth, before express.json) and exposes hit counters on deep health checks.
Adds CLI flags/env fallbacks and advertises a conditional rate_limit capability tag when enabled.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
packages/cli/src/serve/types.ts	Extends `ServeOptions` with rate limit configuration fields.
packages/cli/src/serve/server.ts	Installs rate limiting middleware, surfaces hit counts on deep `/health`, and advertises the feature when enabled.
packages/cli/src/serve/server.test.ts	Updates expected capability feature lists and adds predicate coverage for the new conditional tag.
packages/cli/src/serve/runQwenServe.ts	Disposes/drains the rate limiter during shutdown.
packages/cli/src/serve/rateLimit.ts	New rate limiter implementation (tiers, keying, GC, sampled logging, fail-open behavior).
packages/cli/src/serve/rateLimit.test.ts	New unit test suite for limiter mechanics and edge cases.
packages/cli/src/serve/capabilities.ts	Registers `rate_limit` and adds a conditional predicate toggle.
packages/cli/src/commands/serve.ts	Adds CLI flags + env var fallbacks and validates rate-limit inputs at boot.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

wenshao

Additional findings (Suggestion):

server.ts:3019 + runQwenServe.ts:1136 — Rate limiter stored via raw string key app.locals['_rateLimiter'] with manual type cast. Codebase has typed-accessor pattern (setDeviceFlowRegistry/getDeviceFlowRegistry). A typo in either string silently breaks shutdown disposal, leaking the GC setInterval. Consider setRateLimiter(app, limiter) / getRateLimiter(app) accessors. (Cannot post as inline — line not in diff.)

Needs Human Review (low confidence):

No integration test creates createServeApp with rateLimit: true to verify middleware wiring through the real Express stack
Test gaps for config validation throws (lines 170-181), fail-open catch behavior (line 308), and sampled logger suppression counting
MAX_CLIENT_ID_LENGTH, CLIENT_ID_RE, and header name 'x-qwen-client-id' duplicated from server.ts with only a "keep in sync" comment — no compile-time safety
No startup log when rate limiter is enabled — operators can't confirm active config without tracing from CLI args to env vars
getHitCounts() returns cumulative counters that never reset — health data becomes less useful over time
429 response headers don't include X-RateLimit-Tier — clients and monitoring can't identify which tier was exceeded from headers alone

— qwen3.7-max via Qwen Code /review

- Add onError callback for fail-open observability (catch block + bucket overflow) - Fix env var priority: CLI --no-rate-limit now overrides QWEN_SERVE_RATE_LIMIT - Remove sampledLog.clear() from sweep to preserve suppressed counts - Add sampledLog.clear() to dispose() for shutdown cleanup - Add typed accessors setRateLimiter/getRateLimiter (replace raw string key) - Wire onError callback through server.ts daemonLog 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

doudouOUC · 2026-06-08T13:33:32Z

Review round 1 — fix summary (`8cc8d8e`)

Thread	Author	Verdict	Action
`serve.ts:309` env var override CLI	Copilot	Agree	Fixed — CLI flag takes precedence
`rateLimit.ts:335` dispose missing logger clear	Copilot	Agree	Fixed — dispose() now calls sampledLog?.clear()
`server.ts:921` integration test gap	Copilot	Defer	server.test.ts blocked by pre-existing undici import failure
`server.test.ts:215` TS2367 type error	wenshao	Push back	Verified no error with tsc --noEmit; likely bot false positive
`rateLimit.ts:308` silent catch {}	wenshao	Agree	Fixed — added onError callback, wired through daemonLog
`rateLimit.ts:262` fail-open no observability	wenshao	Agree	Fixed — bucket overflow now calls onError()
`rateLimit.ts:115` anonymous bucket DoS	wenshao	Push back	remotePort changes per connection; documented known limitation
`rateLimit.ts:187` sweep clears suppressed counts	wenshao	Agree	Fixed — removed sampledLog.clear() from sweep

Also addressed wenshao's top-level suggestion: replaced raw app.locals['_rateLimiter'] string key with typed setRateLimiter()/getRateLimiter() accessors.

Remove default:false from yargs so argv['rate-limit'] is undefined when neither flag is passed. Use ?? for env var fallback so --no-rate-limit (explicit false) wins over QWEN_SERVE_RATE_LIMIT=1. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

wenshao · 2026-06-08T14:04:09Z

🔍 Local verification report — real build, unit tests & live runtime

中文 TL;DR: 在本地用 tmux，基于 PR 的真实合并结果（PR 已是 base 分支 daemon_mode_b_main 的最新态，0 落后）做了「构建 + 单测 + 真实 HTTP 运行时」三层验证。结论：PR 自身测试全绿、限流在真实 qwen serve 进程里行为完全正确、未引入任何回归，建议合并。 作者 Test plan 里那条没勾的手动测试（--rate-limit 超限返回 429）我已实测通过。补充：整个 src/serve 目录有 16 个失败，但在不含本 PR 的 base 分支上逐条一模一样地复现——是 base 分支既有问题（capability fixture 漂移 + 启动类集成测试在沙箱里 ACP 超时），与本 PR 无关。

Verdict: ✅ The PR is clean. Its own tests pass (rateLimit 25/25 + the rate_limit capability test), the limiter behaves correctly in a live process, and it adds zero regressions. The unchecked manual test in the PR's plan (429 on over-limit) is now verified end-to-end.

Environment

Base branch is daemon_mode_b_main (not main). PR 8cc8d8e65 is 0 commits behind base 82879b300 → the PR head is the merge result.
Platform Linux (kernel 6.12), Node v22. Built CI-style: npm install + npm run build (exit 0). Driven through tmux.

✅ The PR's own tests — green

rateLimit.test.ts: 25 / 25 passed — bucket mechanics, continuous drip refill, tier assignment, key extraction (incl. IPv6 normalization), exempt routes, fail-open, draining, reset, callbacks.
The new rate_limit conditional-capability test in server.test.ts passes ({rateLimit:true}→advertised, {}→not).

✅ Live runtime test — real `qwen serve` process (the highlight)

Started qwen serve --rate-limit --rate-limit-read 3 --rate-limit-window-ms 600000 on loopback (auth-free dev default), then drove it with curl:

READ tier (limit 3):    GET  /capabilities ×5  →  200 200 200 429 429        ✅ exact
Per-tier isolation:     POST /capabilities (mutation, fresh bucket) → 404, NOT 429   ✅
Exempt route:           GET  /health ×6        →  200 200 200 200 200 200    ✅ never limited
Capability advertised:  GET  /capabilities body contains "rate_limit"        ✅

The 429 response is well-formed:

HTTP/1.1 429 Too Many Requests
Retry-After: 200
X-RateLimit-Limit: 3
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1780927000

{"error":"Rate limit exceeded","code":"rate_limit_exceeded","tier":"read","retryAfterMs":199882}

This confirms the unchecked item in the Test plan (--rate-limit … → 429) end-to-end, and additionally demonstrates per-tier bucket isolation (exhausting read does not affect mutation) and route exemption (/health).

✅ No regressions — the 16 `src/serve` failures are pre-existing on base

The full src/serve directory shows 16 failures, but they reproduce identically on the clean base branch without this PR:

Run	Result
`src/serve` on PR (merge result)	16 failed / 931 passed
The same 5 failing files on base (`daemon_mode_b_main`, no PR)	16 failed / 471 passed — same files, same assertions

None are rate-limit related:

3× capability-registry tests — the EXPECTED_REGISTERED_FEATURES fixture is out of sync, missing 3 features (workspace_agent_generate, session_stats, workspace_mcp_manage) that this PR does not touch. (The PR did correctly add rate_limit to both the registry and the fixture.)
8× runQwenServe boot tests + workspaceAgents / workspaceMemory / acpHttp/transport / envSnapshot — server-boot / env integration tests failing in this sandbox with AcpSessionBridge initialize timed out after 10000ms.

→ Base-branch concerns, not introduced by #4861. Worth flagging separately: the capability-fixture drift is deterministic (3 features need adding to the fixture) regardless of environment.

Source & security assessment

A clean token-bucket limiter with solid defensive properties:

Placement — after bearerAuth (only authenticated requests count) and before express.json (reject before body parse). Correct.
Fail-open everywhere — draining → pass; bucket map at 10k cap → pass; any internal error → onError + next(). A limiter fault can't take the daemon down.
Lifecycle — GC timer is unref()'d (won't block process exit) and cleared on dispose(); shutdown path calls setDraining(true) + dispose().
Keying — loopback → client-id (else anonymous); non-loopback → IP (+client-id) with IPv6 normalization and a ^[A-Za-z0-9._:-]{1,128}$ client-id guard.
429 carries Retry-After + X-RateLimit-*; off by default (backward compatible). The known limitations (reverse-proxy shared IP, 0.0.0.0 collapse, client-id rotation, boot-time-only) are accurately documented in the PR.

Build / typecheck

npm install + npm run build → exit 0. The PR source also compiles and runs under tsx (the live test) and under vitest/esbuild (25/25), so its TypeScript is sound.

Scope: build + unit + live HTTP runtime on Linux against the real merge result. Not exercised: reverse-proxy / non-loopback multi-client keying (documented limitations), and the pre-existing base-branch failures noted above.

chiga0

Code Review Overview (AI Generated)

PR: #4861 — feat(serve): add per-tier HTTP rate limiting for daemon
Type: New Feature (production code — security-relevant middleware)
Change size: +1051/-1 across 8 files
HEAD: ef3dac490e

Findings Summary

Critical/Major: 0 items
Minor: 0 items
Nit: 0 items

Assessment

Rate limiter design: Token bucket with continuous drip refill is the correct choice for HTTP rate limiting — avoids bursty behavior of fixed-window counters while keeping O(1) per-request cost. Three-tier system (prompt 10/min, mutation 30/min, read 120/min) maps well to the relative cost of each operation class. Fail-open on bucket overflow (10k cap) and internal errors is the right availability-first decision for a daemon.

Middleware wiring: Placed after bearerAuth (only count authenticated requests) and before express.json() (reject early without burning JSON parse CPU) — optimal ordering. Exempt routes (health, heartbeat, SSE, /acp, OPTIONS, demo) are correctly identified.

Graceful shutdown: setDraining(true) + dispose() in the shutdown path via typed getRateLimiter accessor. GC timer is properly unref()'d to avoid blocking process exit. sampledLog.clear() moved to dispose() (not sweep) — commit 2 fix is correct.

wenshao CR audit (6 items, all addressed at HEAD):

TS2367 typecheck: rate_limit added to EXPECTED_REGISTERED_FEATURES + dedicated test branch in fixture ✓
Silent catch-all: onError callback added, wired through daemonLog ✓
Bucket overflow observability: onError called with overflow error message ✓
Sweep clearing sampledLog: sampledLog?.clear() removed from sweep(), added to dispose() + reset() ✓
CLI flag vs env var priority: removed default: false from yargs, ?? correctly handles --no-rate-limit → false overriding env var ✓
Loopback anonymous key collision: documented as known limitation ✓

CLI validation: --rate-limit-window-ms minimum 1000ms prevents degenerate configurations. All numeric flags validated as positive integers before passing to serve options. Env var fallbacks follow existing pattern (?? chain).

Test coverage: 25 unit tests covering bucket mechanics, tier resolution, key extraction (including IPv6 normalization), fail-open, draining, reset, and callbacks. Server test fixture properly updated for conditional rate_limit capability tag. No integration test for middleware wiring in server.test.ts — acceptable since the limiter is opt-in and the unit tests are thorough.

Key Observations

Clean, well-structured feature addition. The defense-in-depth approach (CLI validation + createRateLimiter validation) is appropriate. The documented limitations (reverse proxy IP, 0.0.0.0 bind, client-id rotation on loopback) are realistic for a boot-time-only local daemon and don't require code changes. Approved.

This review was generated by QoderWork AI

Already have 2 approved, 3ks.

qwen-code-ci-bot · 2026-06-08T15:57:07Z

+      // Fail-open if bucket map is at capacity
+      let tierMap = buckets.get(key);
+      if (!tierMap) {
+        if (buckets.size >= MAX_BUCKETS) {


[Critical] Client-id rotation + global fail-open defeats rate limiting entirely

The key extractor (createKeyExtractor) uses the self-declared x-qwen-client-id header as part of the bucket key. An attacker can rotate through >10,000 unique client-id values (the regex [A-Za-z0-9._:-]+ with 128-char max provides a near-infinite keyspace) to fill the bucket map past MAX_BUCKETS. Once buckets.size >= 10000, this fail-open branch calls next() unconditionally — disabling rate limiting for all clients globally, not just the attacker's.

The GC sweep (staleness threshold = windowMs * 2 = 120s at default) cannot keep pace; an attacker sending ~83 req/s sustains 10,001 fresh buckets. On non-loopback binds behind a reverse proxy (where all traffic shares one remoteAddress), a single remote client can trigger this.

The PR documents client-id bypass as a known limitation, but the global impact (disabling protection for every client, not just the attacker) goes beyond per-client bypass.

Suggested fixes (pick one):

Add a per-IP aggregate bucket that caps total requests from one IP regardless of client-id

Key by IP only on non-loopback (ignore client-id for bucketing)

Fail-closed for unknown keys on overflow (reject new keys while preserving existing buckets)

— qwen3.7-max via Qwen Code /review

doudouOUC requested review from Copilot and wenshao June 8, 2026 13:04

Copilot started reviewing on behalf of doudouOUC June 8, 2026 13:05 View session

qwen-code-ci-bot previously requested changes Jun 8, 2026

View reviewed changes

Copilot AI reviewed Jun 8, 2026

View reviewed changes

Comment thread packages/cli/src/commands/serve.ts Outdated

Comment thread packages/cli/src/serve/rateLimit.ts

Comment thread packages/cli/src/serve/server.ts

wenshao requested changes Jun 8, 2026

View reviewed changes

Comment thread packages/cli/src/serve/server.test.ts

Comment thread packages/cli/src/serve/rateLimit.ts Outdated

Comment thread packages/cli/src/serve/rateLimit.ts

Comment thread packages/cli/src/serve/rateLimit.ts

Comment thread packages/cli/src/serve/rateLimit.ts

wenshao previously approved these changes Jun 8, 2026

View reviewed changes

Comment thread packages/cli/src/commands/serve.ts Outdated

doudouOUC dismissed wenshao’s stale review via ef3dac4 June 8, 2026 13:48

wenshao approved these changes Jun 8, 2026

View reviewed changes

doudouOUC requested review from chiga0, liudayiheng, qwen-code-ci-bot and yiliang114 and removed request for liudayiheng June 8, 2026 14:26

chiga0 approved these changes Jun 8, 2026

View reviewed changes

doudouOUC merged commit 613e818 into daemon_mode_b_main Jun 8, 2026
4 checks passed

doudouOUC deleted the feat/serve-rate-limiting branch June 8, 2026 15:22

qwen-code-ci-bot requested changes Jun 8, 2026

View reviewed changes

doudouOUC mentioned this pull request Jun 8, 2026

tracking(serve): daemon capability gaps & prioritized backlog (post v0.16-alpha) #4514

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(serve): add per-tier HTTP rate limiting for daemon (issue #4514 T3.4)#4861

feat(serve): add per-tier HTTP rate limiting for daemon (issue #4514 T3.4)#4861
doudouOUC merged 3 commits into
daemon_mode_b_mainfrom
feat/serve-rate-limiting

doudouOUC commented Jun 8, 2026

Uh oh!

qwen-code-ci-bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenshao left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

doudouOUC commented Jun 8, 2026

Uh oh!

Uh oh!

wenshao commented Jun 8, 2026

Uh oh!

chiga0 left a comment

Uh oh!

Uh oh!

qwen-code-ci-bot Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

doudouOUC commented Jun 8, 2026

Summary

Details

CLI Flags

Known Limitations (documented)

Test plan

Uh oh!

qwen-code-ci-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

doudouOUC commented Jun 8, 2026

Review round 1 — fix summary (8cc8d8e)

Uh oh!

Uh oh!

wenshao commented Jun 8, 2026

🔍 Local verification report — real build, unit tests & live runtime

Environment

✅ The PR's own tests — green

✅ Live runtime test — real qwen serve process (the highlight)

✅ No regressions — the 16 src/serve failures are pre-existing on base

Source & security assessment

Build / typecheck

Uh oh!

chiga0 left a comment

Choose a reason for hiding this comment

Code Review Overview (AI Generated)

Findings Summary

Assessment

Key Observations

Uh oh!

Uh oh!

qwen-code-ci-bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Review round 1 — fix summary (`8cc8d8e`)

✅ Live runtime test — real `qwen serve` process (the highlight)

✅ No regressions — the 16 `src/serve` failures are pre-existing on base