feat(test): add daemon connection stress test + refactor perf harness by doudouOUC · Pull Request #4862 · QwenLM/qwen-code

doudouOUC · 2026-06-08T13:10:38Z

What this PR does

Extract shared helpers from baseline/benchmark daemon tests into dedicated modules and add a new mock-ACP connection stress test suite gated by QWEN_LOADTEST_ENABLED=1.

Refactoring:

_daemon-benchmark-helpers.ts: extract /usr/bin/time wrappers from benchmark test
_daemon-perf-report.ts: shared formatPercentiles, collectPlatformInfo, writeSnapshotArtifacts, resolveOutputDir
_daemon-harness.ts: export gitHead(), makeTempWorkspace(), sleep(), ScenarioResult, add lastSeenId to ConsumeSseResult
Slim baseline + benchmark tests by ~500 lines via imports

New features:

fixtures/mock-acp-child/agent.mjs: mock ACP agent using real AgentSideConnection SDK with env-controlled error injection modes (echo/reject/crash-on-prompt/hang)
mock-acp-typecheck.test.ts: compile-time Agent interface compliance check
qwen-daemon-loadtest.test.ts: 5 scenarios — rapid lifecycle, SSE slow-consumer eviction, Last-Event-ID reconnect, ACP crash recovery, burst concurrent sessions
vitest.loadtest.config.ts: isolated config with root: __dirname anchoring

Why it's needed

Issue #4514 T3.4 — the daemon's HTTP/SSE surface lacks automated stress testing. Without a mock-ACP-based harness, connection lifecycle bugs, SSE backpressure regressions, and crash recovery issues can only be caught by manual testing or real-model integration tests (which are slow and flaky). The refactoring reduces ~500 lines of duplication across baseline/benchmark tests, making the shared infrastructure maintainable as the test suite grows.

Reviewer Test Plan

How to verify

# Baseline still passes after refactoring
npx vitest run integration-tests/cli/qwen-serve-baseline.test.ts

# Benchmark still passes after helper extraction
QWEN_BENCHMARK_ENABLED=1 BENCHMARK_ITERATIONS=1 npx vitest run integration-tests/cli/qwen-daemon-vs-cli-benchmark.test.ts

# Type compliance test
npx vitest run integration-tests/cli/mock-acp-typecheck.test.ts

# Loadtest scenarios (requires daemon binary built)
QWEN_LOADTEST_ENABLED=1 npx vitest run --config integration-tests/vitest.loadtest.config.ts

# Default CI excludes loadtest
npx vitest run --config integration-tests/vitest.config.ts

Evidence (Before & After)

N/A — test infrastructure only, no user-facing behavior change.

Tested on

Platform	Node	Status
macOS (darwin arm64)	v24.12.0	TypeScript + ESLint pass

Risk & Scope

Risk: Low. All new tests are gated behind QWEN_LOADTEST_ENABLED=1 and excluded from default CI. Refactoring is pure import-replacement with no behavioral changes.
Scope: Test infrastructure only. No production code changed.
Rollback: Revert this single commit.

🤖 Generated with Qwen Code

… (issue #4514 T3.4) Extract shared helpers from baseline/benchmark tests into dedicated modules and add a new mock-ACP connection stress test suite. Refactoring (PR1 scope): - _daemon-harness.ts: export gitHead(), makeTempWorkspace(), sleep(), ScenarioResult, lastSeenId tracking in consumeSseEvents - _daemon-benchmark-helpers.ts: extract /usr/bin/time wrappers (spawnDaemonWithTime, parseTimeOutput, measureProcessTreeRss, measureCliStartupWithProfiler) from benchmark test - _daemon-perf-report.ts: shared formatPercentiles, collectPlatformInfo, writeSnapshotArtifacts, resolveOutputDir - Slim baseline + benchmark tests to import from new modules New features (PR2 scope): - fixtures/mock-acp-child/agent.mjs: mock ACP agent using real AgentSideConnection SDK, env-controlled modes (echo/reject/ crash-on-prompt/hang) - mock-acp-typecheck.test.ts: compile-time Agent interface check - qwen-daemon-loadtest.test.ts: 5 scenarios gated by QWEN_LOADTEST_ENABLED=1 — rapid lifecycle, SSE slow-consumer eviction, Last-Event-ID reconnect, ACP crash recovery, burst concurrent sessions - vitest.loadtest.config.ts: isolated config with root anchoring 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

qwen-code-ci-bot

Hi @doudouOUC — thanks for the PR! The daemon loadtest work looks genuinely useful, but the PR body doesn't follow our pull request template. Please update it to include the required sections so reviewers can properly evaluate the change:

## What this PR does — currently ## Summary, which is close but the template heading is required
## Why it's needed — missing. What problem does the daemon stress test solve? Why now?
## Reviewer Test Plan — currently ## Test plan. Please rename and fill in ### How to verify, ### Evidence (Before & After) (N/A for test infra is fine), and the ### Tested on table
## Risk & Scope — missing. Main risk, what's out of scope, breaking changes
## Linked Issues — Issue: #4514 T3.4 is referenced but doesn't use closing keywords (Closes #4514 / Fixes #4514)

Please update the PR body to match the template and re-request review. Happy to take another look once it's updated.

中文说明

你好 @doudouOUC — 感谢提交 PR！daemon 压力测试的工作看起来很有价值，但 PR 正文没有遵循我们的 PR 模板。请更新为模板要求的各个部分，以便审查者能正确评估：

## What this PR does — 当前用的是 ## Summary，接近但需要改成模板标题
## Why it's needed — 缺失。daemon 压力测试解决什么问题？为什么现在做？
## Reviewer Test Plan — 当前用的是 ## Test plan。请改名并填写 ### How to verify、### Evidence (Before & After)（测试基础设施可以填 N/A）、### Tested on 表格
## Risk & Scope — 缺失。主要风险、不在范围内的内容、破坏性变更
## Linked Issues — 引用了 Issue: #4514 T3.4 但没有使用关闭关键词（Closes #4514 / Fixes #4514）

请按照模板更新 PR 正文后重新请求审查。更新后我很乐意再看看。

— Qwen Code · qwen3.7-max

Copilot

Pull request overview

This PR adds a gated daemon connection stress/load test suite (using a mock ACP child) and refactors the existing baseline/benchmark integration tests by extracting shared performance harness/reporting helpers into reusable modules.

Changes:

Extract shared benchmark/perf reporting utilities into _daemon-benchmark-helpers.ts and _daemon-perf-report.ts, and extend _daemon-harness.ts with reusable helpers.
Add a mock ACP child fixture (agent.mjs) plus a type-shape compliance test for the mock agent interface.
Add a gated load test (qwen-daemon-loadtest.test.ts) with an isolated Vitest config, and exclude it from the default integration test run.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
integration-tests/vitest.loadtest.config.ts	Adds an isolated Vitest config for running only the loadtest suite.
integration-tests/vitest.config.ts	Excludes the loadtest suite from the default integration test run.
integration-tests/fixtures/mock-acp-child/agent.mjs	Adds a mock ACP agent process used by stress/load tests.
integration-tests/cli/qwen-serve-baseline.test.ts	Refactors baseline test to use extracted helpers/reporting utilities.
integration-tests/cli/qwen-daemon-vs-cli-benchmark.test.ts	Refactors benchmark test to use extracted harness/reporting utilities.
integration-tests/cli/qwen-daemon-loadtest.test.ts	Adds gated daemon connection stress/load scenarios.
integration-tests/cli/mock-acp-typecheck.test.ts	Adds a compile-time shape check for the mock agent interface.
integration-tests/cli/_daemon-perf-report.ts	New shared report primitives (platform info, output dir, formatting, artifact writes).
integration-tests/cli/_daemon-harness.ts	Extends harness exports for reuse across baseline/benchmark/loadtest.
integration-tests/cli/_daemon-benchmark-helpers.ts	New benchmark-only helpers (notably `/usr/bin/time` wrappers) extracted from the benchmark test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot correctly noted that console.debug and console.dir also write to stdout in Node, which would corrupt the NDJSON pipe. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

doudouOUC · 2026-06-08T13:23:50Z

Review feedback — round 1

Thread	Reviewer	Action	Detail
`agent.mjs:34` console.debug/dir	Copilot	Fixed in `586860c`	Added `console.debug = console.error` and `console.dir = console.error`
`_daemon-benchmark-helpers.ts:182` spawn error handler	Copilot	Deferred	Pre-existing code extracted as-is; hardening is a separate PR
`_daemon-benchmark-helpers.ts:380` spawn error handler	Copilot	Deferred	Same rationale
PR description format	qwen-code-ci-bot	Fixed	Updated to match PR template (What/Why/Test Plan/Risk sections)

Resolved 3/3 inline threads.

… recovery - All 5 scenarios now use try/catch/finally so snapshot.status reflects actual test outcome - SSE eviction scenario asserts evicted === true (near-deterministic with maxQueued=16 + 80+ events) - Crash recovery verifies end-to-end by creating a fresh session post-crash, not just health check 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

doudouOUC · 2026-06-08T14:03:12Z

Review feedback — round 2 (wenshao)

Thread	Action	Detail
`loadtest.test.ts:211` snapshot status before expect	Fixed in `e1b0df4`	All 5 scenarios use try/catch/finally; status reflects actual outcome
`loadtest.test.ts:272` SSE eviction not asserted	Fixed in `e1b0df4`	Added `expect(evicted).toBe(true)` with maxQueued=16
`loadtest.test.ts:354` crash recovery only checks health	Fixed in `e1b0df4`	Now creates fresh session post-crash via `createOrAttachSession`

Resolved 3/3 threads.

wenshao · 2026-06-08T14:09:47Z

✅ Local verification — daemon connection stress test + perf-harness refactor (Linux)

Verified on the true post-merge state (the PR is rebased on current daemon_mode_b_main — git rev-list --left-right --count origin/daemon_mode_b_main...HEAD = 0 3, so the head already is the merge result). The PR touches only integration-tests/ (10 files); packages/{cli,core,sdk-typescript} source is unchanged, so the daemon under test is the existing one. The author's matrix lists macOS only — this run adds Linux (x64) coverage.

Built the full daemon chain for the harness (core → cli → sdk-typescript → web-templates → esbuild bundle): 0 TypeScript errors across the whole graph; the bundled daemon (dist/cli.js, which globalSetup.ts wires in via TEST_CLI_PATH) boots and listens.

1. New loadtest suite — 5/5 scenarios PASS ✅

QWEN_LOADTEST_ENABLED=1 vitest run --config integration-tests/vitest.loadtest.config.ts (real daemon + mock-ACP child, no model):

Scenario	Result	Time
rapid lifecycle (20× create→prompt→close)	✅	3.06s
SSE slow-consumer eviction through HTTP	✅	60.96s
Last-Event-ID reconnect under concurrent load	✅	1.16s
ACP child crash → session error recovery	✅	0.96s
burst: 10 concurrent sessions	✅	1.10s

Test Files 1 passed · Tests 5 passed, 67.7s total. The daemon's HTTP/SSE surface survives the lifecycle churn, backpressure flood, reconnect, and a crash-on-prompt child without wedging.

2. Typecheck (`tsc --noEmit` over all integration-tests) — clean for the PR ✅

0 errors in all 10 changed files (incl. mock-acp-typecheck.test.ts, whose whole point is the compile-time Agent-interface assertion). The only diagnostic emitted is tsconfig.json(8,13) TS5063 for the "//" paths-comment — and that file is byte-identical to base (not in this PR's changeset), so it's pre-existing.

3. Refactor safety net — baseline test: 6 passed, 1 skipped, 1 pre-existing env failure ✅

vitest run --root ./integration-tests cli/qwen-serve-baseline.test.ts (the test slimmed by the helper extraction):

✅ 6 passed — RSS scaling, attach latency, MCP grandchildren count as sessions grow, 2× SSE backpressure unit, etc. These exercise the extracted _daemon-perf-report.ts + _daemon-harness.ts helpers, confirming the extraction is behavior-preserving.
↓ 1 skipped — prompt-latency (no model-credential env), expected.
× 1 failed — MCP child amplification > clientCount matches external pgrep observation (expected 4 MCP grandchildren, the sandbox's external pgrep saw 2). This is not a PR regression: the failing test is at line 480, which lies outside every diff hunk (hunks touch only lines 33–228 and 704), i.e. its code is byte-unchanged by this PR. It is an environment-sensitive process-tree observation test (pgrep visibility of spawned MCP children under a root/container sandbox); its sibling subtest in the same block passed.

Methodology note

act/real CI is not reproducible locally; this exercises the real built daemon + mock-ACP child end-to-end. The _daemon-benchmark-helpers.ts extraction (the /usr/bin/time wrappers used only by the benchmark suite) typechecks clean and follows the same extraction pattern as the two runtime-validated modules; the full benchmark run is cold-start-bound and did not complete within the time budget in this sandbox (a property of the benchmark, not the PR).

Verdict

✅ Verified. The new mock-ACP loadtest harness works end-to-end (5/5 on Linux), the perf-harness refactor is behavior-preserving (6/6 runnable baseline assertions pass; the lone failure is a pre-existing, env-sensitive pgrep test byte-unchanged by this PR), and all 10 changed files typecheck clean. No blockers from a runtime-verification standpoint.

wenshao · 2026-06-08T14:31:35Z

🔍 Local verification report — real build + daemon integration/load tests

中文 TL;DR: 在本地用 tmux，基于 PR 的真实合并结果（PR 已是 base 分支 daemon_mode_b_main 最新态，0 落后）做了「构建 + 重构回归 + 真实跑 loadtest」验证。结论：重构是忠实的（benchmark 装上 /usr/bin/time 后 10/10 全过、typecheck 过、baseline 除一个既有环境问题外全过），新的 loadtest 框架 5 个场景过了 4 个。唯一一个真问题：SSE 慢消费者驱逐场景在本 Linux 沙箱里确定性失败（连跑两次都 evicted=false）。由于整套 loadtest 默认被 CI 排除、且本 PR 不碰任何生产代码，可以合并；但建议把那个 SSE 场景的“灌流”改成并发再依赖它。其余两个失败（benchmark 的 /usr/bin/time ENOENT、baseline 的 pgrep MCP 计数）都已证明是环境问题、与本 PR 无关。

Verdict: ✅ Mergeable as test-infra. The refactoring is faithful and the harness mostly works; zero production code is touched and the whole load-test suite is excluded from default CI. One new scenario (SSE eviction) needs hardening to be reliable cross-platform — details + a suggested fix below.

Environment

Base is daemon_mode_b_main. PR e1b0df4e0 is 0 commits behind base 82879b300 → PR head is the merge result.
Linux (kernel 6.12), Node v22.22.2 (author validated on macOS / Node v24). Built CI-style (npm install + npm run build, exit 0); daemon binary dist/cli.js produced. All via tmux.

Results

Check	Result
Default CI config excludes the loadtest	✅ `vitest list` collects 0 `qwen-daemon-loadtest` files
`mock-acp-typecheck.test.ts` (Agent interface compliance)	✅ passed
`qwen-daemon-vs-cli-benchmark.test.ts` (refactored, `QWEN_BENCHMARK_ENABLED=1`)	✅ 10/10 (after installing `/usr/bin/time` — see below)
`qwen-serve-baseline.test.ts` (refactored)	✅ all pass except one pre-existing env test (below)
`qwen-daemon-loadtest.test.ts` — 5 mock-ACP scenarios	4 / 5 passed

Loadtest scenario breakdown:

✓ rapid lifecycle: create-prompt-close cycles        3171ms
× SSE slow consumer triggers eviction through HTTP   61002ms   ← deterministic fail here
✓ Last-Event-ID reconnect under concurrent load      1389ms
✓ ACP child crash → session error recovery           1081ms
✓ burst: concurrent sessions with mock prompts       1080ms

Two failures that are environmental, not this PR (both proven)

Benchmark — spawn /usr/bin/time ENOENT. GNU time isn't installed in this sandbox; the benchmark (and its extracted _daemon-benchmark-helpers.ts) wraps /usr/bin/time -v. After apt-get install time, the benchmark passes 10/10 → the helper extraction is correct.
Baseline — clientCount matches external pgrep observation. Timed out waiting for 4 MCP grandchildren … mcpGrandchildren=[2 pids]. The PR does not touch this test, and it fails identically on the clean base branch (no PR) — the sandbox only spawns 2 MCP grandchildren, not 4. Pre-existing/environmental.

The one real finding — SSE eviction scenario is environment-sensitive

SSE slow consumer triggers eviction through HTTP fails at expect(evicted).toBe(true) — the slow consumer is never evicted within 60s in this environment. I re-ran it twice in isolation; it failed both times → deterministic here, not flaky.

Likely cause: the flood fires prompts sequentially awaited:

for (let i = 0; i < promptCount; i++) {
  await d.client.prompt(session.sessionId, { prompt: [{ type: 'text', text: `flood-${i}` }] });
}

Awaiting each prompt throttles event production to roughly the slow consumer's drain rate (consumerDelayMs: 200), so the 16-deep queue doesn't reliably overflow → the daemon's queue-overflow eviction doesn't trip on this Linux/Node-22 host (it presumably did on the author's macOS/Node-24). The daemon eviction logic itself is base code and isn't exercised to failure here — the scenario just doesn't generate enough backpressure.

Suggested fix: make the flood genuinely concurrent so it outpaces the consumer regardless of host timing, e.g. fire without awaiting and then settle:

await Promise.allSettled(
  Array.from({ length: promptCount }, (_, i) =>
    d.client.prompt(session.sessionId, { prompt: [{ type: 'text', text: `flood-${i}` }] })),
);

This isn't a CI blocker (the suite is gated behind QWEN_LOADTEST_ENABLED=1 and excluded from default config), but worth hardening before this scenario is relied on in cron/CI.

Refactoring soundness

The ~500-line extraction into _daemon-benchmark-helpers.ts / _daemon-perf-report.ts / harness exports is faithful: benchmark passes 10/10 (with /usr/bin/time), mock-acp-typecheck passes, and the baseline test's only failure reproduces identically on base. The new vitest.loadtest.config.ts is correctly isolated (10-min timeout, fileParallelism:false, SDK alias) and the default config correctly excludes the loadtest.

Scope: build + refactor regression + live load-test on Linux/Node-22 against the real merge result. No production code is modified by this PR. Not covered: macOS/Node-24 (author's platform) where the SSE-eviction scenario reportedly passes.

chiga0

Code Review Overview (AI Generated)

PR: #4862 — feat(test): add daemon connection stress test + refactor perf harness
Type: New Feature + Refactor (test infrastructure)
Change size: +1268/-505 across 10 files
HEAD: e1b0df4e03

Findings Summary

Critical/Major: 0 items
Minor: 0 items
Nit: 0 items

Assessment

Refactoring quality: All 3 extracted modules (_daemon-harness.ts additions, _daemon-benchmark-helpers.ts, _daemon-perf-report.ts) preserve exact behavior — function signatures, regex patterns, spawn args, and logic flow are identical to the originals. makeTempWorkspace cleanly generalizes the prefix parameter. ~500 lines of duplication eliminated without any behavioral drift.

New stress tests: The 5 scenarios are well-structured with a clean withDaemon helper pattern (try/catch/finally + proper disposal). All scenarios correctly track snapshot status through assertion lifecycle (commit 3 addressed this well). SSE eviction assertion (evicted === true) is near-deterministic with the configured maxQueued=16 + 80 events. Crash recovery validates end-to-end recovery (new session creation, not just health check). The mock ACP agent using real AgentSideConnection SDK provides high-fidelity testing without real model latency.

Test gating: QWEN_LOADTEST_ENABLED=1 + platform check + sandbox check properly excludes the load test from default CI. Dedicated vitest.loadtest.config.ts with 10-min timeout and no retries is appropriate for stress testing.

wenshao follow-up Suggestions: Noted 7 additional Suggestions in review 4450487353 (post-APPROVE). All valid improvement opportunities (eviction detection decoupling, ScenarioResult.error population, reject/hang mode coverage, typecheck Pick → Required<Pick>, loadtest config typecheck include). None blocking.

Key Observations

Clean test infrastructure PR. The refactoring is a textbook extraction — no hidden behavior changes. The new stress tests follow established patterns from the existing baseline/benchmark suites and add genuine value for catching daemon HTTP/SSE lifecycle bugs. Approved.

This review was generated by QoderWork AI

Already have 2 approved, 3ks.

doudouOUC · 2026-06-08T15:28:44Z

Review feedback — round 3 (wenshao R2)

Thread	Action	Detail
`loadtest:260` evictionReason fragility	Fixed in `44dcc98`	Added `evicted: boolean` to ConsumeSseResult
`loadtest:371` crash recovery prompt	Not taking	Session creation already proves ACP child recovery; prompting in crash mode would re-crash
`loadtest:215` ScenarioResult.error empty	Fixed in `44dcc98`	Error captured in catch blocks via pushScenario helper
`agent.mjs:79` reject/hang unused modes	Not taking	Modes for future expansion; adding tests expands PR scope
`typecheck:13` Pick vs full Agent	Not taking	AgentSideConnection handles missing methods via methodNotFound
`perf-report:42` label discarded with env	Not taking	Inherited behavior, baseName prevents overwrites
`loadtest.config:19` include typecheck	Not taking	Typecheck runs in default CI, not daemon-dependent

Fixed: 2 | Pushed back: 5 | Commit: 44dcc98

doudouOUC requested review from Copilot and wenshao June 8, 2026 13:10

Copilot started reviewing on behalf of doudouOUC June 8, 2026 13:10 View session

qwen-code-ci-bot previously requested changes Jun 8, 2026

View reviewed changes

Copilot AI reviewed Jun 8, 2026

View reviewed changes

Comment thread integration-tests/fixtures/mock-acp-child/agent.mjs

Comment thread integration-tests/cli/_daemon-benchmark-helpers.ts

Comment thread integration-tests/cli/_daemon-benchmark-helpers.ts

fix: redirect console.debug/dir to stderr in mock ACP agent

586860c

Copilot correctly noted that console.debug and console.dir also write to stdout in Node, which would corrupt the NDJSON pipe. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

wenshao reviewed Jun 8, 2026

View reviewed changes

Comment thread integration-tests/cli/qwen-daemon-loadtest.test.ts Outdated

Comment thread integration-tests/cli/qwen-daemon-loadtest.test.ts Outdated

Comment thread integration-tests/cli/qwen-daemon-loadtest.test.ts Outdated

wenshao approved these changes Jun 8, 2026

View reviewed changes

doudouOUC requested a review from qwen-code-ci-bot June 8, 2026 14:26

doudouOUC requested review from chiga0 and yiliang114 June 8, 2026 14:32

wenshao reviewed Jun 8, 2026

View reviewed changes

chiga0 approved these changes Jun 8, 2026

View reviewed changes

doudouOUC merged commit a680b4d into daemon_mode_b_main Jun 8, 2026
16 checks passed

doudouOUC deleted the feat/daemon-loadtest-harness branch June 8, 2026 15:23

doudouOUC mentioned this pull request Jun 8, 2026

tracking(serve): daemon capability gaps & prioritized backlog (post v0.16-alpha) #4514

Open

Conversation

doudouOUC commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Why it's needed

Reviewer Test Plan

How to verify

Evidence (Before & After)

Tested on

Risk & Scope

Uh oh!

qwen-code-ci-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

doudouOUC commented Jun 8, 2026

Review feedback — round 1

Uh oh!

Uh oh!

Uh oh!

Uh oh!

doudouOUC commented Jun 8, 2026

Review feedback — round 2 (wenshao)

Uh oh!

wenshao commented Jun 8, 2026

✅ Local verification — daemon connection stress test + perf-harness refactor (Linux)

1. New loadtest suite — 5/5 scenarios PASS ✅

2. Typecheck (tsc --noEmit over all integration-tests) — clean for the PR ✅

3. Refactor safety net — baseline test: 6 passed, 1 skipped, 1 pre-existing env failure ✅

Methodology note

Verdict

Uh oh!

wenshao commented Jun 8, 2026

🔍 Local verification report — real build + daemon integration/load tests

Environment

Results

Two failures that are environmental, not this PR (both proven)

The one real finding — SSE eviction scenario is environment-sensitive

Refactoring soundness

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chiga0 left a comment

Choose a reason for hiding this comment

Code Review Overview (AI Generated)

Findings Summary

Assessment

Key Observations

Uh oh!

Uh oh!

doudouOUC commented Jun 8, 2026

Review feedback — round 3 (wenshao R2)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

doudouOUC commented Jun 8, 2026 •

edited

Loading

2. Typecheck (`tsc --noEmit` over all integration-tests) — clean for the PR ✅