feat(skills): add triage skill for issue/PR gatekeeping#4577
Conversation
📋 Review SummaryThis PR introduces a new 🔍 General Feedback
🎯 Specific Feedback🔴 CriticalNo critical issues identified that block merging. 🟡 High
🟢 Medium
🔵 Low
✅ Highlights
|
5b0eb5f to
25c7fed
Compare
Evidence of UsageRunning the triage skill via CLI: npm run dev -- -p '使用 triage skill 查看https://github.com/QwenLM/qwen-code/issues/3565' --output-format stream-jsonResults can be seen in:
Related PR
|
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
Address review feedback on PR #4577: - Critical: sanitize untrusted issue text before the shell `gh ... --search` call (command injection via crafted issue titles in a token-bearing CI run) - Critical: add "Skip If Already Handled" guard so CI retries/replays do not post duplicate comments or submit conflicting reviews - Skip draft PRs (add isDraft to the fetch and early-exit) - Fix phantom "Stage 4" reference in the 3-stage issue workflow - Require the `## Reviewer Test Plan` template heading (matches the repo template) - Add gh command examples for label-add and direction request-changes - Document `$QWEN_MAINTAINER_HANDLE` expected format Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
|
Reworked the PR Product Direction Gate (Stage 2) in Direction is the one call the model lacks the context to make — it lives in unwritten maintainer decisions, roadmap intent, and past rejections that aren't in the repo. So instead of giving the model a procedure to follow, this trusts its reasoning and hard-codes only the guardrails it can't derive. These are orthogonal to model strength — a stronger model needs them more, not less, because its wrong direction-call is more convincing:
Note: this supersedes the Stage 2 中文说明重做了 PR 产品方向闸(Stage 2),见 产品方向是模型唯一缺上下文、做不了的判断——它活在维护者没写下来的决定、roadmap 意图、过往拒绝里,仓库里没有。所以这版不再给模型一套流程,而是信它的推理,只硬编码它推不出来的护栏。这些护栏与模型强弱正交——模型越强越需要,因为它错的方向判断更有说服力:
注:这取代了我早前为 review #193 在 Stage 2 加的 |
|
Made escalation explicitly stop the PR flow in To answer the natural question: yes — once Stage 2 escalates to Why stop rather than gather more first:
If a maintainer confirms the direction, code review and testing run afterward. If the human needs more context up front, the place to add it is the escalation brief (prior art, citations, the open question) — not by running the downstream stages. 中文说明把「escalate = 终止流水线」显式钉死,见 直接回答那个很自然的问题:是的——Stage 2 一旦 escalate 到 为什么是「停」而不是先多跑点:
维护者确认方向后,代码审查和测试再跑。若想让人接手时上下文更厚,该加厚的是 escalate 的那份 brief(prior art、引用、待决问题),而不是去跑下游阶段。 |
|
Made Claude Code parity the primary direction signal in The hardest part of triage is product direction, because it normally lives in maintainers' heads — not something the model can read. Claude Code's CHANGELOG fixes that: it's an external, verifiable, citable source of what a CLI coding agent is "supposed" to have. So Stage 2 now leads with a parity check instead of the internal roadmap. How it works:
This replaces the 中文说明把「Claude Code 是否具备该能力」设为方向判断的首要信号,见 triage 最难的就是产品方向——它通常活在维护者脑子里,模型读不到。Claude Code 的 CHANGELOG 正好补这个洞:它是外部、可验证、可引用的,代表「一个 CLI coding agent 应该具备什么」。所以 Stage 2 现在以 parity 检查打头,替掉内部 roadmap。 怎么运作:
这替换了 |
|
Reworked PR Stage 4 into real-scenario tmux testing in Stage 4 was leaning on "the smallest focused test" — which invites unit tests. That's not the bar here. It now:
中文说明把 PR Stage 4 改成 tmux 真实场景测试,见 原来的 Stage 4 写的是「跑最小的聚焦测试」——这会被理解成单元测试,不是这里要的标准。现在:
|
|
Fixed an over-aggressive skip in The cause was a tension between two earlier changes: the idempotency guard (added for the "don't post duplicate comments on CI replays" review) was too coarse — it stopped any already-triaged PR. So when a maintainer re-runs Fix — split the two intents:
中文说明修了一个过度激进的跳过,见 根因是我之前两处改动打架:幂等守卫(为"CI 重放别刷重复评论"那条 review 加的)做得太粗——只要 PR 被 triage 过就停。于是当维护者故意重跑 修法——把两种意图拆开:
|
|
Template gate now cites its source in Looking at the template-gate review on #4670: it correctly told the author which headings were missing, but never linked the template — so the author has no way to know what to copy or that the requirement is real rather than the skill's opinion. Fix (same "cite or don't claim" principle we applied to the direction gate):
You can re-run 中文说明模板闸现在会给出处,见 看了 #4670 上那条模板闸 review:它正确指出了缺哪些标题,但没有链接模板——作者既不知道该照着什么抄,也无法判断这要求是真的还是 skill 自己的意见。 修法(和方向闸用的「cite or don't claim」同一原则):
你可以手动重跑 |
|
Added before/after evidence to PR Stage 4 in Stage 4 drove the real product but only captured the "after" state — which doesn't actually prove a fix. For a bug fix, the maintainer needs to see the bug reproduce and then disappear. So Stage 4 now requires a before/after comparison, same scenario, only the build differs:
Both tmux logs are posted as the evidence. The 中文说明给 PR Stage 4 补了 before/after 凭据,见 Stage 4 之前虽然驱动了真产品,但只截了「after」状态——这其实证明不了「修复」。对 bug fix,维护者需要看到 bug 先复现、再消失。所以 Stage 4 现在要求 before/after 对照,同一场景,只换构建:
两份 tmux 日志一起贴作凭据。 |
|
Made tmux real-scenario testing non-skippable in Triaging #4668, the skill hit an unrelated CLI build failure (a missing Stage 4 now:
Stage 5 tightened to match: real-scenario testing must have passed, not skipped; only changes with no runnable behavior (docs-only) are exempt. 中文说明把 tmux 真实场景测试设为不可跳过,见 triage #4668 时,skill 撞上一个与本 PR 无关的 CLI 构建失败( Stage 4 现在:
Stage 5 同步收紧:真实测试必须通过、不是跳过;只有无可运行行为的改动(纯文档)才豁免。 |
|
Added a concrete tmux before/after example to Stage 4 in The previous commits said "do real testing, exhaust all means" but didn't show the mechanics — which is partly why the agent fumbled it on #4668. Now Stage 4 spells it out, anchored on the key equivalence:
S=triage-test-$(date +%H%M%S); mkdir -p "tmp/$S"
tmux new-session -d -s "$S" -x 200 -y 50 -c "$(pwd)"
tmux send-keys -t "$S" "qwen -p '<scenario>' 2>&1 | tee tmp/$S/before.log" Enter # before: installed, no PR
tmux send-keys -t "$S" "npm run dev -- -p '<scenario>' 2>&1 | tee tmp/$S/after.log" Enter # after: this PR
tmux capture-pane -t "$S" -p -S -5000 > "tmp/$S/session.txt"; tmux kill-session -t "$S"For interactive TUI changes (dialogs, selectors, keyboard nav), 中文说明给 Stage 4 加了一个具体的 tmux before/after 例子,见 前几个 commit 说了「做真实测试、用尽一切办法」,但没给具体怎么做——这也是 agent 在 #4668 上翻车的部分原因。现在 Stage 4 直接写清楚,锚在这个关键等价上:
(命令见上。) 涉及交互式 TUI(弹窗、选择器、键盘导航)的改动, |
|
Reframed the Stage 4 example around the general equivalence in The previous wording over-indexed on
So before/after is just one invocation run two ways — 中文说明把 Stage 4 的例子改成围绕通用等价,见 之前措辞太聚焦
所以 before/after 不过是同一条命令跑两遍—— |
|
Added three step-back judgment questions to Stage 5 in Stages 1-4 are mechanical checks. Stage 5 now forces a holistic re-examination before approving — the AI's own judgment, not a checkbox:
These are now gating criteria for the approval. One thing to confirm: the skill approves ( 中文说明给 Stage 5 加了三条退一步的判断题,见 Stage 1-4 是机械检查;Stage 5 现在强制在 approve 前做一次整体复审——靠 AI 自己的判断,不是打勾:
这三条现在是审批的硬性判据。 一处要跟你确认: skill 目前是 approve( |
Maintainer verification — real local build + runtime testVerified PR head Verdict: ✅ PASS — loads cleanly, registers as a 1. Runtime load through the real
|
|
Added a best-solution reflection to PR Stage 2 in Direction-aligned — even via Claude Code parity — only answers "should this exist?" It doesn't answer "is this the best way to build it?" So before continuing, Stage 2 now reflects deeply once more:
中文说明给 PR Stage 2 加了「最优解」深度反思,见 方向 aligned——哪怕靠 Claude Code parity——只回答了「该不该有」,没回答「这是不是实现它的最好方式」。所以继续前,Stage 2 现在再深想一次:
|
Triage is a QwenLM/qwen-code maintainer workflow (repo-specific labels, bilingual comments, followup-bot coordination), so it belongs in .qwen/skills/ alongside bugfix/feat-dev rather than bundled/, which ships to every end user via npm. Pure file relocation; skill content unchanged. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Address review feedback on PR #4577: - Critical: sanitize untrusted issue text before the shell `gh ... --search` call (command injection via crafted issue titles in a token-bearing CI run) - Critical: add "Skip If Already Handled" guard so CI retries/replays do not post duplicate comments or submit conflicting reviews - Skip draft PRs (add isDraft to the fetch and early-exit) - Fix phantom "Stage 4" reference in the 3-stage issue workflow - Require the `## Reviewer Test Plan` template heading (matches the repo template) - Add gh command examples for label-add and direction request-changes - Document `$QWEN_MAINTAINER_HANDLE` expected format Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…cedural Product direction is the one call the model lacks context to make (unwritten maintainer decisions, roadmap intent, past rejections not in this repo). Trust the model's reasoning and hard-code only the guardrails it cannot derive — these are orthogonal to model strength, so a stronger model needs them more, not less: - cite or it's a question (curb confabulation) - argue the opposite before "aligned" (curb sycophancy) - escalate by default to status/ready-for-human; never auto-reject on direction (wrongly discouraging a contributor is the high-regret error; direction is a maintainer's call) Supersedes the Stage 2 --request-changes added earlier for review item #193: the agent no longer auto-rejects on direction, it escalates to a human instead. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
The direction gate rewrite left "escalate = stop" only implicit. Escalation is a control-flow decision, so state it: when Stage 2 escalates to a human, stop — do not run code review, testing, or approval. Those run only after a maintainer confirms the direction (gate economics; never execute an undecided PR's code; avoid anchoring the maintainer with a premature code-quality read). Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
The most efficient, citable direction check is whether Claude Code already ships
the capability — Qwen Code tracks it, and its CHANGELOG is an external,
verifiable source (unlike tacit maintainer knowledge). Stage 2 now leads with a
changelog parity check:
- present -> direction aligned / admit (cite version + line)
- absent -> NOT a rejection (Qwen Code has its own scope, e.g. Qwen OAuth);
falls through to the existing guardrails
Replaces the docs/developers/roadmap.md citation source with the Claude Code
CHANGELOG.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Stage 4 now drives the real product in a tmux TUI session (via the tmux-real-user-testing skill) instead of running unit / smallest-focused tests. The scenario is built from the PR's core behavior — the user's actual path — and the readable tmux log is posted to the PR as verifiable evidence. Keeps the untrusted-fork safety guardrail. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
The idempotency guard was too coarse: it stopped any already-triaged PR, so a maintainer re-running /triage by hand (e.g. to apply the new tmux Stage 4) got skipped entirely. Scope the duplicate-run skip to unattended runs (CI / GITHUB_ACTIONS) — which still prevents duplicate comments on CI replays per the earlier review — while a hand-typed /triage always runs in full and updates its prior Stage N comments in place. Draft-skip now applies in any mode. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
The template-gate review told authors which headings were missing but not where the requirement comes from, so they did not know which template to copy. Stage 1 now treats .github/pull_request_template.md as the source of truth and requires the blocking review to link it — making the request verifiable and actionable, not just the skill's assertion. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
For a bug fix, real-scenario testing now captures a before/after comparison so the maintainer can confirm the fix is real: reproduce the bug on a build without the PR (installed `qwen` or `main`), then show it fixed on this PR's code via `npm run dev` — same scenario, only the build differs. Both tmux logs are posted as the evidence, matching the template's "Evidence (Before & After)" section. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Triaging #4668 the skill hit an unrelated CLI build failure (missing channels/feishu dep), skipped tmux TUI testing, fell back to unit tests, and still reported PASS. That is backwards: unit tests are covered by other CI; the tmux real test is the core deliverable. Stage 4 now: - makes tmux testing mandatory and not substitutable by unit tests - says to exhaust workarounds for unrelated build breakage (prefer `npm run dev` over the full bundle; install/disable the unrelated module; the installed `qwen` baseline needs no build) - sandboxes untrusted fork code (strip secrets/tokens) instead of skipping it - treats a skipped test as a blocker, never a PASS Stage 5 tightened to match: real-scenario testing must have passed, not skipped; only changes with no runnable behavior (docs-only) are exempt. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Give the agent the exact local-test mechanics it kept fumbling. `-p` runs one prompt headless, so `npm run dev -- -p '…'` is the dev-build equivalent of `qwen -p '…'` — a clean A/B where only the build differs. The example shows capturing before (installed qwen) and after (dev build) logs in tmux, and notes that interactive TUI changes still need the full tmux-real-user-testing drive. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…ge 4 The before/after example over-indexed on `-p`. The actual point is that `npm run dev -- <args>` runs the working tree exactly as `qwen <args>` runs the installed build — so before/after is one invocation run two ways, and `-p` is just one example of it (interactive TUI drops the -p and drives both the same). Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Before approving, the skill now steps back and re-examines three things beyond the mechanical checklist: 1. Is the need real, or change for its own sake? 2. Is the code simple — no over-engineering or over-defense? 3. Is it confident to merge this itself, or does it need a maintainer? Real doubt on #3 routes to a maintainer. The action stays `--approve` (a merge-ready endorsement), not auto-merge. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Direction-aligned (even via Claude Code parity) is not enough on its own: before continuing, the skill now reflects deeply on whether the PR's solution is actually the best one, or whether a simpler / more composable / more native product design would serve the same need better. A materially better path is surfaced to the maintainer (and suggested to the author), never an autonomous rejection. Routed so the parity fast-path also passes through it. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…p judgment The "is this the best solution?" reflection is the most important check in the direction gate. Promoted it to a bold, weighty instruction — never skip, never rush, weight it above the mechanical checks, this is where most value is won or lost — while keeping the bound that only a materially better path is surfaced (to maintainer + author), never an autonomous rejection. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Both workflows loaded for every run, bloating context. SKILL.md now keeps only routing + shared rules (target resolution, untrusted input, skip-if-handled, comment format, CI output) and points to: - references/issue-workflow.md (issue Stages 1-3) - references/pr-workflow.md (PR Stages 1-5) The agent reads only the workflow matching the target type, so a PR run never loads the issue workflow and vice versa. SKILL.md drops from 408 to ~125 lines. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
… for feature requests Issue workflow: collapse three stages into two (intake + handle by type), fold labeling into Stage 1, and replace manual product-fit/KISS checks with a `/goal` reflection for feature requests. PR workflow and SKILL.md: compress verbose instructions into concise directives without losing substance. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…rkflow phase Issue: one comment total (Stage 1 posts, Stage 2 updates in place via PATCH). PR: three comments (Gate → Review+Test → Final Decision), each concise key-point format. Add "best approach" reflection to PR Stage 3 final decision. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Replace checklist-style comments with conversational maintainer tone. Add solution review to Stage 1 gate, narrow Stage 2 code review to critical blockers + AGENTS.md violations, require inline tmux screenshots as evidence, and restructure Stage 3 into a genuine reflection step with separate approve/reject actions.
Split Stage 2a into two steps: first propose an independent solution from the PR description alone, then read the diff and compare. This forces the reviewer to form a baseline judgment before being anchored by the PR's approach. Also updated Stage 3 reflection to reference the independent proposal as a comparison anchor. Suggested by @yiliang114 in #4577.
All local code reads (grep, read_file, glob) now run inside an ephemeral git worktree so the main working tree is never touched. tmux real-scenario testing stays in the main tree since it needs the local build environment.
- Sanitize tmux <scenario> to prevent shell injection from PR text - Add polling wait between tmux send-keys to prevent stdin interleaving - Fix duplicate guard to use HTML comment markers matching actual output - Add comment ID capture mechanism (gh pr comment --json id) - Clarify 'solution review' wording to acknowledge diff skimming - Add --body-file exception for hardcoded gh pr review verdicts - Add --reason "not planned" to gh issue close - Add explicit stop rule for unclear issues - Add CJK-empty SAFE_KEYWORDS fallback to label-based search - Add <!-- qwen-triage stage=N --> markers to all comment templates
- Add ⛔ Mandatory Pre-flight Checks section to SKILL.md (worktree + tmux) - Add explicit worktree creation step at start of PR Stage 1 - Reinforce Stage 2b: tmux capture-pane output MUST be inlined in comment - Add pre-post checklist: verify comment contains actual terminal output
33d3e5b to
a6d2961
Compare
yiliang114
left a comment
There was a problem hiding this comment.
LGTM! Teacher ShanGuo!
…nd bot coordination Builds on #4577's triage skill with: - Tiered gate model (comment gate + label gate) to prevent noise on already-reviewed PRs and avoid duplicate labels - PR/Issue auto-detection for numeric inputs - Dual marker coordination with followup bot (qwen-issue-bot + qwen-maintain) - Issue triage rules: P0-P3 priority, completeness check, version staleness, auto-fix/welcome-PR eligibility, related vs duplicate search strategy - PR intake rules: author validation table by PR type, scope thresholds (800/1500), deep review handoff conditions, label taxonomy mapping - Comment anti-patterns and public comment distillation - Bot workflow skip list updated with 3 new markers
* feat(skill): supplement triage skill with gate model, intake rules, and bot coordination Builds on #4577's triage skill with: - Tiered gate model (comment gate + label gate) to prevent noise on already-reviewed PRs and avoid duplicate labels - PR/Issue auto-detection for numeric inputs - Dual marker coordination with followup bot (qwen-issue-bot + qwen-maintain) - Issue triage rules: P0-P3 priority, completeness check, version staleness, auto-fix/welcome-PR eligibility, related vs duplicate search strategy - PR intake rules: author validation table by PR type, scope thresholds (800/1500), deep review handoff conditions, label taxonomy mapping - Comment anti-patterns and public comment distillation - Bot workflow skip list updated with 3 new markers * feat(ci): add @qwen /triage workflow for automated issue and PR triage Triggers: - Issue opened → auto triage - PR opened → auto triage - Comment `@qwen /triage` → re-triage (maintainers only) - Manual workflow_dispatch Uses qwen-code-action to invoke the triage skill, following the same pattern as qwen-code-pr-review.yml. * ci(triage): separate issue follow-up ownership * ci(triage): tighten pr triage safeguards * docs(triage): simplify skill references * ci(triage): restore opened issue triage * fix(ci): stabilize qwen triage runs * fix(ci): coordinate qwen review triggers * docs(ci): clarify triage concurrency guard * docs(ci): clarify triage cancellation guard intent * fix(ci): tighten qwen cancellation guards * revert(ci): restore qwen-code-pr-review.yml to main The review workflow changes (removing auto-trigger, conditional cancel-in-progress) conflict with PR #4843 which redesigns the review flow with a delay mechanism. Revert this file so the two PRs don't conflict — review workflow improvements belong in #4843.
What this PR does
Adds a
/triageproject skill (under.qwen/skills/triage/) that automates GitHub issue classification and PR admission review with staged bilingual comments, designed for CI/GitHub Actions usage.Why it's needed
Maintainers need a consistent, automated workflow for triaging incoming issues (classifying, labeling, reproducing) and gatekeeping PRs (template check, direction review, KISS-focused code review, testing). This skill codifies that process.
It lives in
.qwen/skills/(repo-local maintainer workflow) rather thanpackages/core/src/skills/bundled/, because the skill is specific to the QwenLM/qwen-code repo (its label taxonomy, bilingual convention, followup-bot coordination) and should not ship to every end user via npm.Evidence (Before & After)
🤖 Generated with Qwen Code
Co-Authored-By: Qwen-Coder noreply@qwen.ai