Skip to content

feat(serve): add per-tier HTTP rate limiting for daemon (issue #4514 T3.4)#4861

Merged
doudouOUC merged 3 commits into
daemon_mode_b_mainfrom
feat/serve-rate-limiting
Jun 8, 2026
Merged

feat(serve): add per-tier HTTP rate limiting for daemon (issue #4514 T3.4)#4861
doudouOUC merged 3 commits into
daemon_mode_b_mainfrom
feat/serve-rate-limiting

Conversation

@doudouOUC

Copy link
Copy Markdown
Collaborator

Summary

  • Adds opt-in per-tier HTTP rate limiting to qwen serve via --rate-limit flag
  • Token bucket with continuous drip refill, default off to preserve backward compatibility
  • Three tiers: prompt (10/min), mutation (30/min), read (120/min) per client key
  • Health, heartbeat, SSE, and /acp endpoints exempt
  • Fail-open on bucket map capacity (10k keys) and internal errors
  • Graceful shutdown: setDraining(true) + dispose() in shutdown path

Details

File Change
rateLimit.ts NEW — core middleware: token bucket, key extractor (loopback=client-id, non-loopback=IP+client-id), tier resolver, sampled logging, GC (timer + request-count), config validation
rateLimit.test.ts NEW — 25 unit tests: bucket mechanics, continuous drip refill, tier assignment, key extraction (IPv6 normalization), exempt routes, fail-open, draining, reset, callbacks
types.ts 5 new ServeOptions fields
capabilities.ts rate_limit conditional feature tag (3 coordinated changes)
server.ts Middleware wiring between bearerAuth and express.json, capabilities toggle, deep health hit counts, app.locals lifecycle exposure
runQwenServe.ts Shutdown setDraining + dispose
serve.ts CLI flags (--rate-limit, --rate-limit-prompt/mutation/read/window-ms), env var fallbacks, boot validation
server.test.ts Capability fixture update for rate_limit conditional tag

CLI Flags

Flag Default Env Var
--rate-limit false QWEN_SERVE_RATE_LIMIT
--rate-limit-prompt 10 QWEN_SERVE_RATE_LIMIT_PROMPT
--rate-limit-mutation 30 QWEN_SERVE_RATE_LIMIT_MUTATION
--rate-limit-read 120 QWEN_SERVE_RATE_LIMIT_READ
--rate-limit-window-ms 60000 QWEN_SERVE_RATE_LIMIT_WINDOW_MS

Known Limitations (documented)

  • Reverse proxy: req.socket.remoteAddress is the proxy IP; all clients share one bucket
  • 0.0.0.0 bind: local clients all appear as 127.0.0.1
  • Client-ID bypass: loopback clients can rotate client-id for fresh buckets (bounded by 10k cap → fail-open)
  • Boot-time only: limits require daemon restart to change

Test plan

  • 25 unit tests pass (bucket, tier, key, exempt, fail-open, draining, reset, callbacks)
  • TypeScript typecheck passes (no rate-limit related errors)
  • ESLint + prettier pass (pre-commit hook)
  • Capability test fixture updated for rate_limit conditional tag
  • Manual: qwen serve --rate-limit --rate-limit-prompt 2 → 3rd prompt returns 429

🤖 Generated with Qwen Code

…T3.4)

Token-bucket rate limiter with continuous drip refill, opt-in via
--rate-limit flag. Three tiers: prompt (10/min), mutation (30/min),
read (120/min). Health, heartbeat, SSE, and /acp endpoints are exempt.

- rateLimit.ts: core middleware with fail-open, bucket cap (10k),
  GC sweep (timer + request-count), sampled logging, graceful shutdown
- types.ts: 5 new ServeOptions fields
- capabilities.ts: rate_limit conditional feature tag
- server.ts: middleware wiring between bearerAuth and express.json,
  deep health hit counts, app.locals lifecycle exposure
- runQwenServe.ts: shutdown dispose + setDraining
- serve.ts: CLI flags, env var fallbacks, boot validation
- server.test.ts: capability fixture update for rate_limit
- rateLimit.test.ts: 25 unit tests covering bucket mechanics, tier
  resolution, key extraction, fail-open, draining, reset, callbacks

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

@qwen-code-ci-bot qwen-code-ci-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @doudouOUC — thanks for the PR! The rate-limiting feature itself looks interesting, but the PR body doesn't follow our PR template. A few required sections are missing or renamed:

  • "What this PR does" → currently "Summary"
  • "Why it's needed" → absent. What's the motivation? Who asked for this? The title references issue #4514 T3.4 — please link it and explain the user-facing problem.
  • "Reviewer Test Plan" → absent. The checkbox list under "Test plan" isn't a reviewer test plan — we need concrete steps a reviewer can follow to confirm the change works (e.g., qwen serve --rate-limit --rate-limit-prompt 2, then send 3 requests and confirm the 3rd gets 429).
  • "Risk & Scope" → absent. The "Known Limitations" section touches on this but doesn't address breaking changes or out-of-scope items.
  • "Linked Issues" → absent. The title says "issue #4514 T3.4" but there's no Closes #4514 or similar link.

Could you update the PR body to match the template? Happy to re-review once that's done.

中文说明

感谢提交 PR!限流功能本身看起来不错,但 PR 描述没有按照我们的 PR 模板 填写。部分必填章节缺失或改名了:

  • "What this PR does" → 当前写的是 "Summary"
  • "Why it's needed" → 缺失。动机是什么?标题引用了 issue #4514 T3.4,请链接并说明用户侧的问题。
  • "Reviewer Test Plan" → 缺失。"Test plan" 下的复选框不是 reviewer 测试计划——需要给出 reviewer 可以跟随操作的具体步骤(例如 qwen serve --rate-limit --rate-limit-prompt 2,发送 3 个请求确认第 3 个返回 429)。
  • "Risk & Scope" → 缺失。"Known Limitations" 部分涉及了相关内容,但没有涵盖 breaking changes 或超出范围的事项。
  • "Linked Issues" → 缺失。标题提到 "issue #4514 T3.4" 但没有 Closes #4514 等链接。

请按模板更新 PR 描述,更新后我会重新审查。

Qwen Code · qwen3.7-max

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in, per-tier HTTP rate limiter to the qwen serve daemon, intended to protect high-cost endpoints (especially prompts) while keeping compatibility by defaulting to “off”.

Changes:

  • Introduces a new rateLimit middleware (token bucket w/ continuous refill), plus unit tests.
  • Wires the limiter into the HTTP server stack (after bearerAuth, before express.json) and exposes hit counters on deep health checks.
  • Adds CLI flags/env fallbacks and advertises a conditional rate_limit capability tag when enabled.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
packages/cli/src/serve/types.ts Extends ServeOptions with rate limit configuration fields.
packages/cli/src/serve/server.ts Installs rate limiting middleware, surfaces hit counts on deep /health, and advertises the feature when enabled.
packages/cli/src/serve/server.test.ts Updates expected capability feature lists and adds predicate coverage for the new conditional tag.
packages/cli/src/serve/runQwenServe.ts Disposes/drains the rate limiter during shutdown.
packages/cli/src/serve/rateLimit.ts New rate limiter implementation (tiers, keying, GC, sampled logging, fail-open behavior).
packages/cli/src/serve/rateLimit.test.ts New unit test suite for limiter mechanics and edge cases.
packages/cli/src/serve/capabilities.ts Registers rate_limit and adds a conditional predicate toggle.
packages/cli/src/commands/serve.ts Adds CLI flags + env var fallbacks and validates rate-limit inputs at boot.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/cli/src/commands/serve.ts Outdated
Comment thread packages/cli/src/serve/rateLimit.ts
Comment thread packages/cli/src/serve/server.ts

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (Suggestion):

  • server.ts:3019 + runQwenServe.ts:1136 — Rate limiter stored via raw string key app.locals['_rateLimiter'] with manual type cast. Codebase has typed-accessor pattern (setDeviceFlowRegistry/getDeviceFlowRegistry). A typo in either string silently breaks shutdown disposal, leaking the GC setInterval. Consider setRateLimiter(app, limiter) / getRateLimiter(app) accessors. (Cannot post as inline — line not in diff.)

Needs Human Review (low confidence):

  • No integration test creates createServeApp with rateLimit: true to verify middleware wiring through the real Express stack
  • Test gaps for config validation throws (lines 170-181), fail-open catch behavior (line 308), and sampled logger suppression counting
  • MAX_CLIENT_ID_LENGTH, CLIENT_ID_RE, and header name 'x-qwen-client-id' duplicated from server.ts with only a "keep in sync" comment — no compile-time safety
  • No startup log when rate limiter is enabled — operators can't confirm active config without tracing from CLI args to env vars
  • getHitCounts() returns cumulative counters that never reset — health data becomes less useful over time
  • 429 response headers don't include X-RateLimit-Tier — clients and monitoring can't identify which tier was exceeded from headers alone

— qwen3.7-max via Qwen Code /review

Comment thread packages/cli/src/serve/server.test.ts
Comment thread packages/cli/src/serve/rateLimit.ts Outdated
Comment thread packages/cli/src/serve/rateLimit.ts
Comment thread packages/cli/src/serve/rateLimit.ts
Comment thread packages/cli/src/serve/rateLimit.ts
- Add onError callback for fail-open observability (catch block + bucket overflow)
- Fix env var priority: CLI --no-rate-limit now overrides QWEN_SERVE_RATE_LIMIT
- Remove sampledLog.clear() from sweep to preserve suppressed counts
- Add sampledLog.clear() to dispose() for shutdown cleanup
- Add typed accessors setRateLimiter/getRateLimiter (replace raw string key)
- Wire onError callback through server.ts daemonLog

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
@doudouOUC

Copy link
Copy Markdown
Collaborator Author

Review round 1 — fix summary (8cc8d8e)

Thread Author Verdict Action
serve.ts:309 env var override CLI Copilot Agree Fixed — CLI flag takes precedence
rateLimit.ts:335 dispose missing logger clear Copilot Agree Fixed — dispose() now calls sampledLog?.clear()
server.ts:921 integration test gap Copilot Defer server.test.ts blocked by pre-existing undici import failure
server.test.ts:215 TS2367 type error wenshao Push back Verified no error with tsc --noEmit; likely bot false positive
rateLimit.ts:308 silent catch {} wenshao Agree Fixed — added onError callback, wired through daemonLog
rateLimit.ts:262 fail-open no observability wenshao Agree Fixed — bucket overflow now calls onError()
rateLimit.ts:115 anonymous bucket DoS wenshao Push back remotePort changes per connection; documented known limitation
rateLimit.ts:187 sweep clears suppressed counts wenshao Agree Fixed — removed sampledLog.clear() from sweep

Also addressed wenshao's top-level suggestion: replaced raw app.locals['_rateLimiter'] string key with typed setRateLimiter()/getRateLimiter() accessors.

wenshao
wenshao previously approved these changes Jun 8, 2026
Comment thread packages/cli/src/commands/serve.ts Outdated
Remove default:false from yargs so argv['rate-limit'] is undefined
when neither flag is passed. Use ?? for env var fallback so
--no-rate-limit (explicit false) wins over QWEN_SERVE_RATE_LIMIT=1.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
@wenshao

wenshao commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

🔍 Local verification report — real build, unit tests & live runtime

中文 TL;DR: 在本地用 tmux,基于 PR 的真实合并结果(PR 已是 base 分支 daemon_mode_b_main 的最新态,0 落后)做了「构建 + 单测 + 真实 HTTP 运行时」三层验证。结论:PR 自身测试全绿、限流在真实 qwen serve 进程里行为完全正确、未引入任何回归,建议合并。 作者 Test plan 里那条没勾的手动测试--rate-limit 超限返回 429)我已实测通过。补充:整个 src/serve 目录有 16 个失败,但在不含本 PR 的 base 分支上逐条一模一样地复现——是 base 分支既有问题(capability fixture 漂移 + 启动类集成测试在沙箱里 ACP 超时),与本 PR 无关。

Verdict: ✅ The PR is clean. Its own tests pass (rateLimit 25/25 + the rate_limit capability test), the limiter behaves correctly in a live process, and it adds zero regressions. The unchecked manual test in the PR's plan (429 on over-limit) is now verified end-to-end.

Environment

  • Base branch is daemon_mode_b_main (not main). PR 8cc8d8e65 is 0 commits behind base 82879b300 → the PR head is the merge result.
  • Platform Linux (kernel 6.12), Node v22. Built CI-style: npm install + npm run build (exit 0). Driven through tmux.

✅ The PR's own tests — green

  • rateLimit.test.ts: 25 / 25 passed — bucket mechanics, continuous drip refill, tier assignment, key extraction (incl. IPv6 normalization), exempt routes, fail-open, draining, reset, callbacks.
  • The new rate_limit conditional-capability test in server.test.ts passes ({rateLimit:true}→advertised, {}→not).

✅ Live runtime test — real qwen serve process (the highlight)

Started qwen serve --rate-limit --rate-limit-read 3 --rate-limit-window-ms 600000 on loopback (auth-free dev default), then drove it with curl:

READ tier (limit 3):    GET  /capabilities ×5  →  200 200 200 429 429        ✅ exact
Per-tier isolation:     POST /capabilities (mutation, fresh bucket) → 404, NOT 429   ✅
Exempt route:           GET  /health ×6        →  200 200 200 200 200 200    ✅ never limited
Capability advertised:  GET  /capabilities body contains "rate_limit"        ✅

The 429 response is well-formed:

HTTP/1.1 429 Too Many Requests
Retry-After: 200
X-RateLimit-Limit: 3
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1780927000

{"error":"Rate limit exceeded","code":"rate_limit_exceeded","tier":"read","retryAfterMs":199882}

This confirms the unchecked item in the Test plan (--rate-limit … → 429) end-to-end, and additionally demonstrates per-tier bucket isolation (exhausting read does not affect mutation) and route exemption (/health).

✅ No regressions — the 16 src/serve failures are pre-existing on base

The full src/serve directory shows 16 failures, but they reproduce identically on the clean base branch without this PR:

Run Result
src/serve on PR (merge result) 16 failed / 931 passed
The same 5 failing files on base (daemon_mode_b_main, no PR) 16 failed / 471 passed — same files, same assertions

None are rate-limit related:

  • 3× capability-registry tests — the EXPECTED_REGISTERED_FEATURES fixture is out of sync, missing 3 features (workspace_agent_generate, session_stats, workspace_mcp_manage) that this PR does not touch. (The PR did correctly add rate_limit to both the registry and the fixture.)
  • runQwenServe boot tests + workspaceAgents / workspaceMemory / acpHttp/transport / envSnapshot — server-boot / env integration tests failing in this sandbox with AcpSessionBridge initialize timed out after 10000ms.

→ Base-branch concerns, not introduced by #4861. Worth flagging separately: the capability-fixture drift is deterministic (3 features need adding to the fixture) regardless of environment.

Source & security assessment

A clean token-bucket limiter with solid defensive properties:

  • Placement — after bearerAuth (only authenticated requests count) and before express.json (reject before body parse). Correct.
  • Fail-open everywhere — draining → pass; bucket map at 10k cap → pass; any internal error → onError + next(). A limiter fault can't take the daemon down.
  • Lifecycle — GC timer is unref()'d (won't block process exit) and cleared on dispose(); shutdown path calls setDraining(true) + dispose().
  • Keying — loopback → client-id (else anonymous); non-loopback → IP (+client-id) with IPv6 normalization and a ^[A-Za-z0-9._:-]{1,128}$ client-id guard.
  • 429 carries Retry-After + X-RateLimit-*; off by default (backward compatible). The known limitations (reverse-proxy shared IP, 0.0.0.0 collapse, client-id rotation, boot-time-only) are accurately documented in the PR.

Build / typecheck

npm install + npm run build → exit 0. The PR source also compiles and runs under tsx (the live test) and under vitest/esbuild (25/25), so its TypeScript is sound.


Scope: build + unit + live HTTP runtime on Linux against the real merge result. Not exercised: reverse-proxy / non-loopback multi-client keying (documented limitations), and the pre-existing base-branch failures noted above.

@doudouOUC doudouOUC requested review from chiga0, liudayiheng, qwen-code-ci-bot and yiliang114 and removed request for liudayiheng June 8, 2026 14:26

@chiga0 chiga0 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Overview (AI Generated)

PR: #4861 — feat(serve): add per-tier HTTP rate limiting for daemon
Type: New Feature (production code — security-relevant middleware)
Change size: +1051/-1 across 8 files
HEAD: ef3dac490e

Findings Summary

  • Critical/Major: 0 items
  • Minor: 0 items
  • Nit: 0 items

Assessment

Rate limiter design: Token bucket with continuous drip refill is the correct choice for HTTP rate limiting — avoids bursty behavior of fixed-window counters while keeping O(1) per-request cost. Three-tier system (prompt 10/min, mutation 30/min, read 120/min) maps well to the relative cost of each operation class. Fail-open on bucket overflow (10k cap) and internal errors is the right availability-first decision for a daemon.

Middleware wiring: Placed after bearerAuth (only count authenticated requests) and before express.json() (reject early without burning JSON parse CPU) — optimal ordering. Exempt routes (health, heartbeat, SSE, /acp, OPTIONS, demo) are correctly identified.

Graceful shutdown: setDraining(true) + dispose() in the shutdown path via typed getRateLimiter accessor. GC timer is properly unref()'d to avoid blocking process exit. sampledLog.clear() moved to dispose() (not sweep) — commit 2 fix is correct.

wenshao CR audit (6 items, all addressed at HEAD):

  • TS2367 typecheck: rate_limit added to EXPECTED_REGISTERED_FEATURES + dedicated test branch in fixture ✓
  • Silent catch-all: onError callback added, wired through daemonLog
  • Bucket overflow observability: onError called with overflow error message ✓
  • Sweep clearing sampledLog: sampledLog?.clear() removed from sweep(), added to dispose() + reset()
  • CLI flag vs env var priority: removed default: false from yargs, ?? correctly handles --no-rate-limitfalse overriding env var ✓
  • Loopback anonymous key collision: documented as known limitation ✓

CLI validation: --rate-limit-window-ms minimum 1000ms prevents degenerate configurations. All numeric flags validated as positive integers before passing to serve options. Env var fallbacks follow existing pattern (?? chain).

Test coverage: 25 unit tests covering bucket mechanics, tier resolution, key extraction (including IPv6 normalization), fail-open, draining, reset, and callbacks. Server test fixture properly updated for conditional rate_limit capability tag. No integration test for middleware wiring in server.test.ts — acceptable since the limiter is opt-in and the unit tests are thorough.

Key Observations

Clean, well-structured feature addition. The defense-in-depth approach (CLI validation + createRateLimiter validation) is appropriate. The documented limitations (reverse proxy IP, 0.0.0.0 bind, client-id rotation on loopback) are realistic for a boot-time-only local daemon and don't require code changes. Approved.


This review was generated by QoderWork AI

@doudouOUC doudouOUC dismissed qwen-code-ci-bot’s stale review June 8, 2026 15:22

Already have 2 approved, 3ks.

@doudouOUC doudouOUC merged commit 613e818 into daemon_mode_b_main Jun 8, 2026
4 checks passed
@doudouOUC doudouOUC deleted the feat/serve-rate-limiting branch June 8, 2026 15:22
// Fail-open if bucket map is at capacity
let tierMap = buckets.get(key);
if (!tierMap) {
if (buckets.size >= MAX_BUCKETS) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Client-id rotation + global fail-open defeats rate limiting entirely

The key extractor (createKeyExtractor) uses the self-declared x-qwen-client-id header as part of the bucket key. An attacker can rotate through >10,000 unique client-id values (the regex [A-Za-z0-9._:-]+ with 128-char max provides a near-infinite keyspace) to fill the bucket map past MAX_BUCKETS. Once buckets.size >= 10000, this fail-open branch calls next() unconditionally — disabling rate limiting for all clients globally, not just the attacker's.

The GC sweep (staleness threshold = windowMs * 2 = 120s at default) cannot keep pace; an attacker sending ~83 req/s sustains 10,001 fresh buckets. On non-loopback binds behind a reverse proxy (where all traffic shares one remoteAddress), a single remote client can trigger this.

The PR documents client-id bypass as a known limitation, but the global impact (disabling protection for every client, not just the attacker) goes beyond per-client bypass.

Suggested fixes (pick one):

  • Add a per-IP aggregate bucket that caps total requests from one IP regardless of client-id
  • Key by IP only on non-loopback (ignore client-id for bucketing)
  • Fail-closed for unknown keys on overflow (reject new keys while preserving existing buckets)

— qwen3.7-max via Qwen Code /review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants