feat(review): expand review pipeline + qwen review CLI subcommands#3754
Conversation
Review skill (SKILL.md) changes:
- Step 4: 5 → 9 parallel agents (split Correctness/Security, add Test
Coverage, 3 undirected personas: attacker / 3am-oncall / maintainer)
- Step 5: verification "uncertain → reject" → "uncertain → low-confidence"
(terminal-only "Needs Human Review" bucket; never posted as PR comments)
- Step 6: single reverse audit → iterative (terminate on no-new-findings,
hard cap 3 rounds)
- Step 9: self-PR detection (downgrade APPROVE/REQUEST_CHANGES → COMMENT
when GitHub forbids self-review with HTTP 422); CI status check
(downgrade APPROVE → COMMENT on red/pending CI); existing-Qwen-comment
classification with priority order Stale > Resolved > Overlap > NoConflict
(only Overlap blocks for confirmation)
`qwen review` CLI subcommands (packages/cli/src/commands/review/):
- fetch-pr — clean stale + fetch PR ref + create worktree + metadata
- pr-context — emit Markdown context file with security preamble +
already-discussed dedup section
- load-rules — read review rules from base branch (4 source files)
- deterministic— run tsc, eslint, ruff, cargo-clippy, go-vet, golangci-lint
on changed files; filtered + structured findings JSON
(TypeScript/JavaScript, Python, Rust, Go)
- presubmit — self-PR + CI status + existing-comment classification in
a single JSON report
- cleanup — worktree + branch ref + per-target temp files (idempotent)
Cross-platform: execFileSync (no shell), path.join, CRLF normalization,
which/where for tool detection. Replaces bash-style inline commands in
SKILL.md; works identically on macOS/Linux/Windows.
Path consistency: SKILL.md temp files moved from /tmp/qwen-review-* to
.qwen/tmp/qwen-review-* — matches what os.tmpdir() resolves to across
platforms (macOS returns /var/folders/... not /tmp).
DESIGN.md gains five "Why ..." sections explaining each design decision;
docs/users/features/code-review.md synced for user-visible changes.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
| const isSelfPr = author.toLowerCase() === me.toLowerCase(); | ||
|
|
||
| // --- CI status --------------------------------------------------------- | ||
| const checkRunsResp = ghApi( |
There was a problem hiding this comment.
[Critical] presubmit fetches check-runs and PR review comments via gh api without pagination. GitHub REST list endpoints return only the first page by default, so later-page failing checks can be missed and /review may approve red CI. Later-page existing Qwen comments can also be missed, which can cause duplicate inline comments.
| const checkRunsResp = ghApi( | |
| const checkRunsResp = ghApiPaginated( | |
| `repos/${owner}/${repo}/commits/${commitSha}/check-runs`, | |
| ) as { check_runs?: CheckRun[] }[]; |
Please add paginated API support (for example gh api --paginate --slurp with page flattening) and use it for the check-runs and PR review comments endpoints.
— gpt-5.5 via Qwen Code /review
| context: string; | ||
| state: string; | ||
| } | ||
|
|
There was a problem hiding this comment.
[Critical] The CI classifier only treats queued, in_progress, and pending as pending. GitHub check runs can also report statuses such as waiting and requested; those currently fall through as neither failed nor pending, so a PR whose checks have not actually started can be classified as all_pass.
| const PENDING_STATES = new Set([ | |
| 'queued', | |
| 'in_progress', | |
| 'pending', | |
| 'waiting', | |
| 'requested', | |
| ]); |
A more robust alternative is to treat any run.status !== 'completed' as pending unless it is explicitly handled otherwise.
— gpt-5.5 via Qwen Code /review
`qwen review pr-context` now renders each replied-to inline-comment thread as the original reviewer comment + chronological reply chain, instead of only listing the root-comment snippet. This lets review agents see at a glance whether a topic has been addressed (e.g. a "Fixed in <commit>" reply closes the thread) and avoids re-reporting already-resolved concerns without forcing the LLM driver to manually summarise each reply chain in agent prompts. - Walk `in_reply_to_id` chain to group replies under their root comment - Sort replies chronologically (by id, monotonic on GitHub) - Render thread block: root snippet as a quote + bulleted reply list - Sort threads by `(path, line)` for deterministic output - SKILL.md note updated to point agents at the new chain format
`qwen review pr-context` now also fetches `gh api repos/{owner}/{repo}/pulls/{n}/reviews`
and renders a "Review summaries" section listing each reviewer's
overall body (the comment they typed alongside an APPROVED /
CHANGES_REQUESTED / COMMENTED submission). Closes a real gap found
during the PR #3684 review:
> "@wenshao [CHANGES_REQUESTED]: The previously identified exported
> type rename issue no longer maps to the current PR diff, so this
> review only includes the remaining high-confidence blocker."
Without this section, the LLM driver's review agents would have missed
that integration note from the prior reviewer.
- New `RawReview` type + extra `ghApi` call
- Filter: skip empty bodies + the canonical "No issues found. LGTM!"
template the qwen-review pipeline auto-emits — those carry no
agent-actionable content beyond the review state itself
- Sort meaningful reviews by `submitted_at` for chronological output
- Stdout summary now reports `M/N review summaries` (M = kept after
filter)
Smoke-tested on PR #3684: 30 inline, 3 issue, 1/30 review summaries
correctly surfaces the @wenshao CHANGES_REQUESTED body and filters the
29 LGTM templates.
`gh api <path>` defaults to per_page=30. Busy PRs cross that limit on inline comments, issue comments, and reviews — the latest entries (the ones most likely to contain new reviewer feedback or in-flight reply chains) end up on page 2+ and were silently truncated. Concrete bug found while re-reviewing PR #3684: Before: `30 inline, 3 issue comments, 1/30 review summaries` After: `97 inline, 3 issue comments, 6/67 review summaries` 5 additional reviewer-level summaries surfaced — including the @wenshao 2026-04-30 "Multi-agent re-review (Phase C)" body with the explicit verification notes that this PR's pipeline is supposed to chain forward into the next review. Changes: - `lib/gh.ts`: new `ghApiAll(path)` helper using `gh api --paginate`, which walks every `next` link and concatenates each page's array. - `pr-context.ts`: 3 fetches (inline / issue / reviews) → `ghApiAll`. - `presubmit.ts`: PR comments fetch → `ghApiAll` too (existing-comment classification was equally susceptible to dropping page 2+ overlap candidates). `check-runs` and `commits/<sha>/status` calls retain `ghApi` — those return objects (with embedded arrays) and rarely cross 30 entries.
|
I get the motivation behind moving parts of That said, I’m still a bit unsure whether this PR’s scope matches the problem it is trying to solve. If the main issue is that qwen3.6-plus/deepseek-v4-pro do not reliably launch SubAgents concurrently, most of this PR does not change the SubAgent runtime or scheduling path. It mainly makes the skill prompt more explicit about emitting all task calls in one response, while also adding a fairly large Could we clarify the intended scope here?
One more concern: Since this logic is now product code rather than prompt guidance, I think we should also add targeted tests around the helper boundaries: paginated Overall, I like the direction of making fragile review workflow pieces deterministic. I just think this PR would be easier to validate if it were narrowed or split, because right now it mixes a model-behavior prompt change, a review-quality expansion, and a new CLI helper layer in one PR. |
|
Thanks for the careful read. This branch has actually been driving the last two days of Will fix the two **[Critical]**s I left via 中文谢谢仔细看。 这个分支最近两天的 合并前会修掉我自己 |
… CLI subcommands
…3754) * feat(review): expand review pipeline + add `qwen review` CLI subcommands Review skill (SKILL.md) changes: - Step 4: 5 → 9 parallel agents (split Correctness/Security, add Test Coverage, 3 undirected personas: attacker / 3am-oncall / maintainer) - Step 5: verification "uncertain → reject" → "uncertain → low-confidence" (terminal-only "Needs Human Review" bucket; never posted as PR comments) - Step 6: single reverse audit → iterative (terminate on no-new-findings, hard cap 3 rounds) - Step 9: self-PR detection (downgrade APPROVE/REQUEST_CHANGES → COMMENT when GitHub forbids self-review with HTTP 422); CI status check (downgrade APPROVE → COMMENT on red/pending CI); existing-Qwen-comment classification with priority order Stale > Resolved > Overlap > NoConflict (only Overlap blocks for confirmation) `qwen review` CLI subcommands (packages/cli/src/commands/review/): - fetch-pr — clean stale + fetch PR ref + create worktree + metadata - pr-context — emit Markdown context file with security preamble + already-discussed dedup section - load-rules — read review rules from base branch (4 source files) - deterministic— run tsc, eslint, ruff, cargo-clippy, go-vet, golangci-lint on changed files; filtered + structured findings JSON (TypeScript/JavaScript, Python, Rust, Go) - presubmit — self-PR + CI status + existing-comment classification in a single JSON report - cleanup — worktree + branch ref + per-target temp files (idempotent) Cross-platform: execFileSync (no shell), path.join, CRLF normalization, which/where for tool detection. Replaces bash-style inline commands in SKILL.md; works identically on macOS/Linux/Windows. Path consistency: SKILL.md temp files moved from /tmp/qwen-review-* to .qwen/tmp/qwen-review-* — matches what os.tmpdir() resolves to across platforms (macOS returns /var/folders/... not /tmp). DESIGN.md gains five "Why ..." sections explaining each design decision; docs/users/features/code-review.md synced for user-visible changes. * feat(review): expose full reply chains in pr-context output `qwen review pr-context` now renders each replied-to inline-comment thread as the original reviewer comment + chronological reply chain, instead of only listing the root-comment snippet. This lets review agents see at a glance whether a topic has been addressed (e.g. a "Fixed in <commit>" reply closes the thread) and avoids re-reporting already-resolved concerns without forcing the LLM driver to manually summarise each reply chain in agent prompts. - Walk `in_reply_to_id` chain to group replies under their root comment - Sort replies chronologically (by id, monotonic on GitHub) - Render thread block: root snippet as a quote + bulleted reply list - Sort threads by `(path, line)` for deterministic output - SKILL.md note updated to point agents at the new chain format * feat(review): include review-level summaries in pr-context output `qwen review pr-context` now also fetches `gh api repos/{owner}/{repo}/pulls/{n}/reviews` and renders a "Review summaries" section listing each reviewer's overall body (the comment they typed alongside an APPROVED / CHANGES_REQUESTED / COMMENTED submission). Closes a real gap found during the PR #3684 review: > "@wenshao [CHANGES_REQUESTED]: The previously identified exported > type rename issue no longer maps to the current PR diff, so this > review only includes the remaining high-confidence blocker." Without this section, the LLM driver's review agents would have missed that integration note from the prior reviewer. - New `RawReview` type + extra `ghApi` call - Filter: skip empty bodies + the canonical "No issues found. LGTM!" template the qwen-review pipeline auto-emits — those carry no agent-actionable content beyond the review state itself - Sort meaningful reviews by `submitted_at` for chronological output - Stdout summary now reports `M/N review summaries` (M = kept after filter) Smoke-tested on PR #3684: 30 inline, 3 issue, 1/30 review summaries correctly surfaces the @wenshao CHANGES_REQUESTED body and filters the 29 LGTM templates. * fix(review): paginate gh API calls to capture comments past page 1 `gh api <path>` defaults to per_page=30. Busy PRs cross that limit on inline comments, issue comments, and reviews — the latest entries (the ones most likely to contain new reviewer feedback or in-flight reply chains) end up on page 2+ and were silently truncated. Concrete bug found while re-reviewing PR #3684: Before: `30 inline, 3 issue comments, 1/30 review summaries` After: `97 inline, 3 issue comments, 6/67 review summaries` 5 additional reviewer-level summaries surfaced — including the @wenshao 2026-04-30 "Multi-agent re-review (Phase C)" body with the explicit verification notes that this PR's pipeline is supposed to chain forward into the next review. Changes: - `lib/gh.ts`: new `ghApiAll(path)` helper using `gh api --paginate`, which walks every `next` link and concatenates each page's array. - `pr-context.ts`: 3 fetches (inline / issue / reviews) → `ghApiAll`. - `presubmit.ts`: PR comments fetch → `ghApiAll` too (existing-comment classification was equally susceptible to dropping page 2+ overlap candidates). `check-runs` and `commits/<sha>/status` calls retain `ghApi` — those return objects (with embedded arrays) and rarely cross 30 entries. --------- Co-authored-by: wenshao <wenshao@U-K7F6PQY3-2157.local>
…wenLM#3754) * feat(review): expand review pipeline + add `qwen review` CLI subcommands Review skill (SKILL.md) changes: - Step 4: 5 → 9 parallel agents (split Correctness/Security, add Test Coverage, 3 undirected personas: attacker / 3am-oncall / maintainer) - Step 5: verification "uncertain → reject" → "uncertain → low-confidence" (terminal-only "Needs Human Review" bucket; never posted as PR comments) - Step 6: single reverse audit → iterative (terminate on no-new-findings, hard cap 3 rounds) - Step 9: self-PR detection (downgrade APPROVE/REQUEST_CHANGES → COMMENT when GitHub forbids self-review with HTTP 422); CI status check (downgrade APPROVE → COMMENT on red/pending CI); existing-Qwen-comment classification with priority order Stale > Resolved > Overlap > NoConflict (only Overlap blocks for confirmation) `qwen review` CLI subcommands (packages/cli/src/commands/review/): - fetch-pr — clean stale + fetch PR ref + create worktree + metadata - pr-context — emit Markdown context file with security preamble + already-discussed dedup section - load-rules — read review rules from base branch (4 source files) - deterministic— run tsc, eslint, ruff, cargo-clippy, go-vet, golangci-lint on changed files; filtered + structured findings JSON (TypeScript/JavaScript, Python, Rust, Go) - presubmit — self-PR + CI status + existing-comment classification in a single JSON report - cleanup — worktree + branch ref + per-target temp files (idempotent) Cross-platform: execFileSync (no shell), path.join, CRLF normalization, which/where for tool detection. Replaces bash-style inline commands in SKILL.md; works identically on macOS/Linux/Windows. Path consistency: SKILL.md temp files moved from /tmp/qwen-review-* to .qwen/tmp/qwen-review-* — matches what os.tmpdir() resolves to across platforms (macOS returns /var/folders/... not /tmp). DESIGN.md gains five "Why ..." sections explaining each design decision; docs/users/features/code-review.md synced for user-visible changes. * feat(review): expose full reply chains in pr-context output `qwen review pr-context` now renders each replied-to inline-comment thread as the original reviewer comment + chronological reply chain, instead of only listing the root-comment snippet. This lets review agents see at a glance whether a topic has been addressed (e.g. a "Fixed in <commit>" reply closes the thread) and avoids re-reporting already-resolved concerns without forcing the LLM driver to manually summarise each reply chain in agent prompts. - Walk `in_reply_to_id` chain to group replies under their root comment - Sort replies chronologically (by id, monotonic on GitHub) - Render thread block: root snippet as a quote + bulleted reply list - Sort threads by `(path, line)` for deterministic output - SKILL.md note updated to point agents at the new chain format * feat(review): include review-level summaries in pr-context output `qwen review pr-context` now also fetches `gh api repos/{owner}/{repo}/pulls/{n}/reviews` and renders a "Review summaries" section listing each reviewer's overall body (the comment they typed alongside an APPROVED / CHANGES_REQUESTED / COMMENTED submission). Closes a real gap found during the PR QwenLM#3684 review: > "@wenshao [CHANGES_REQUESTED]: The previously identified exported > type rename issue no longer maps to the current PR diff, so this > review only includes the remaining high-confidence blocker." Without this section, the LLM driver's review agents would have missed that integration note from the prior reviewer. - New `RawReview` type + extra `ghApi` call - Filter: skip empty bodies + the canonical "No issues found. LGTM!" template the qwen-review pipeline auto-emits — those carry no agent-actionable content beyond the review state itself - Sort meaningful reviews by `submitted_at` for chronological output - Stdout summary now reports `M/N review summaries` (M = kept after filter) Smoke-tested on PR QwenLM#3684: 30 inline, 3 issue, 1/30 review summaries correctly surfaces the @wenshao CHANGES_REQUESTED body and filters the 29 LGTM templates. * fix(review): paginate gh API calls to capture comments past page 1 `gh api <path>` defaults to per_page=30. Busy PRs cross that limit on inline comments, issue comments, and reviews — the latest entries (the ones most likely to contain new reviewer feedback or in-flight reply chains) end up on page 2+ and were silently truncated. Concrete bug found while re-reviewing PR QwenLM#3684: Before: `30 inline, 3 issue comments, 1/30 review summaries` After: `97 inline, 3 issue comments, 6/67 review summaries` 5 additional reviewer-level summaries surfaced — including the @wenshao 2026-04-30 "Multi-agent re-review (Phase C)" body with the explicit verification notes that this PR's pipeline is supposed to chain forward into the next review. Changes: - `lib/gh.ts`: new `ghApiAll(path)` helper using `gh api --paginate`, which walks every `next` link and concatenates each page's array. - `pr-context.ts`: 3 fetches (inline / issue / reviews) → `ghApiAll`. - `presubmit.ts`: PR comments fetch → `ghApiAll` too (existing-comment classification was equally susceptible to dropping page 2+ overlap candidates). `check-runs` and `commits/<sha>/status` calls retain `ghApi` — those return objects (with embedded arrays) and rarely cross 30 entries. --------- Co-authored-by: wenshao <wenshao@U-K7F6PQY3-2157.local>
Summary
Two pieces in one PR:
SKILL.md) pipeline expansion — 9 parallel agents (3 personas), iterative reverse audit, smarter Step 9 (self-PR + CI + existing-comment classification).qwen reviewCLI subcommands — 6 cross-platform helpers (TS inpackages/cli) replacing the bash-style inline commands SKILL.md previously asked the LLM to run. The LLM now callsqwen review fetch-pr / pr-context / load-rules / deterministic / presubmit / cleanupand reads structured JSON.Skill (
SKILL.md) changesqwen review presubmitcall.Self-PR rule: GitHub rejects both
APPROVEandREQUEST_CHANGESon your own PR (HTTP 422); the new flow downgrades both toCOMMENTautomatically — substantive findings still go in the inlinecommentsarray, only the review-level event is neutralized.CI rule: when verdict is
APPROVEbut check-runs / commit statuses show any failure or all-pending, downgradeAPPROVE → COMMENTand prepend the failed check names to the review body. The LLM cannot see runtime test failures from a static read; approving on red CI is misleading.qwen reviewCLI subcommandsdeterministiccovers TypeScript/JavaScript (tsc,eslint), Python (ruff), Rust (cargo clippy), Go (go vet,golangci-lint) — auto-detects each language's config files, runs the applicable tool on changed files (or whole project filtered to changed files for whole-project tools), parses each tool's structured output (JSON or line-based), and emits a unified findings JSON. TheToolDefregistry pattern indeterministic.tslets future passes plug in mypy/clang-tidy/checkstyle without changing the subcommand contract.Cross-platform
All subcommands use
execFileSync(no shell) so quoting is identical across macOS/Linux/Windows. Tool detection useswhichon POSIX andwhereon Windows. Path handling usespath.joinand CRLF normalization. SKILL.md's previous/tmp/qwen-review-*references are moved to.qwen/tmp/qwen-review-*becauseos.tmpdir()returns/var/folders/...on macOS — using a project-local dir avoids that mismatch entirely.Files
packages/cli/src/commands/review.ts(new) — yargs parent commandpackages/cli/src/commands/review/{fetch-pr,pr-context,load-rules,deterministic,presubmit,cleanup}.ts(new)packages/cli/src/commands/review/lib/{gh,git,paths}.ts(new) — shared helperspackages/cli/src/config/config.ts— registerreviewCommand+ add to subcommand exit allowlistpackages/core/src/skills/bundled/review/SKILL.md— pipeline updates + subcommand callspackages/core/src/skills/bundled/review/DESIGN.md— five new "Why ..." sections explaining each design decisiondocs/users/features/code-review.md— user-visible changesTest plan
qwen review --helplists all 6 subcommands ✅npm run typecheck --workspace @qwen-code/qwen-code✅npx eslint packages/cli/src/commands/review.ts packages/cli/src/commands/review/✅npm run build --workspace @qwen-code/qwen-code-core+--workspace @qwen-code/qwen-code✅state=COMMENTEDstate=CHANGES_REQUESTED+ body informational note about CI failurescleanupidempotency — runs cleanly when worktree/ref/temp files don't existOut of scope (deliberate)
mvn/clang-tidyetc. for those)npm testetc., requiresnpm ciin worktree; left to the LLM driver per existing SKILL.md flowRisk / blast radius
qwen reviewnamespace — no existing user-facing CLI surface conflicts/reviewflow