feat(skills): add /stuck diagnostic skill for frozen sessions#4133
Conversation
Port the /stuck diagnostic capability to qwen-code as a bundled skill. Scans for stuck processes, high CPU/memory, hung subprocesses, and debug logs, then presents a structured diagnostic report. Adapted from claude-code's internal /stuck skill with: - Process identification via command path (node-based CLI, not compiled binary) - Debug log path updated to ~/.qwen/debug/ - Cross-platform stack dump support (macOS sample + Linux /proc/stack) - Direct user-facing output (no Slack dependency) 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
There was a problem hiding this comment.
Pull request overview
Adds a new bundled /stuck diagnostic skill intended to help users investigate frozen or slow Qwen Code sessions by inspecting local processes, child processes, debug logs, and optional stack samples.
Changes:
- Introduces
stuckbundled skill metadata and usage hint. - Defines process-scanning heuristics for high CPU, abnormal states, memory usage, and hung children.
- Specifies structured diagnostic report output for unhealthy or healthy sessions.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
… /stuck 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
… /stuck - Add allowedTools (run_shell_command, read_file) for convention consistency - Rephrase recommended actions as user-facing options, not model-executable commands 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
… sidecar - Add explicit PID argument validation (reject shell metacharacters) to prevent the model from substituting injection payloads into shell commands - Mention macOS/BSD `U` state alongside Linux `D` for uninterruptible sleep, so I/O-blocked macOS sessions are not silently missed - Add `-ww` to `ps` to disable column truncation, so long qwen paths don't fall outside the grep window and cause sessions to be missed - Use `~/.qwen/projects/*/chats/*.runtime.json` sidecars as the primary source of (pid, sessionId, workDir) mappings; `ps` is now a supplement for CPU/RSS/state enrichment 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
- Filter ps to current UID via -u "$(id -u)" — avoid leaking other users' Qwen processes on shared hosts - Note that ps `rss=` is in KB; divide by 1024 before MB comparison - Replace `pgrep -lP` with `pgrep -P` + `ps -p` so child state shows up - Mention `advanced.runtimeOutputDir` setting alongside QWEN_RUNTIME_DIR / QWEN_HOME in the runtime-base description - Add half-line about PID reuse handling and not quoting secrets from debug logs (without inflating the prompt into a full workflow) 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
wenshao
left a comment
There was a problem hiding this comment.
The bundled skill system lacks automated tests that read and parse actual SKILL.md files from disk. Existing tests (skill-manager.test.ts, BundledSkillLoader.test.ts) only use mocked skill data — a YAML frontmatter syntax error or missing required field could merge silently and only surface when a user manually invokes the skill. Consider adding an integration test that discovers all bundled SKILL.md files under packages/core/src/skills/bundled/, parses their frontmatter, and validates required fields.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
- Resolve RUNTIME_DIR from QWEN_RUNTIME_DIR/QWEN_HOME and use it in the sidecar `ls`, debug log path, and `latest` symlink — the previous round only updated the prose and left the actual commands hardcoded - Add explicit fallthrough: when sidecar enumeration finds nothing, fall through to step 2 instead of getting stuck trying to make sidecar work - Replace metacharacter blacklist with digit-only PID whitelist — safer and shorter; "etc." in a blacklist outsourced completeness to the LLM - Drop `strace -p <pid> -c -f` from the Linux stack-dump branch: `-c` blocks until the target exits, hanging the diagnostic on the very conditions it should diagnose; `ptrace_scope=1` would also misreport permission errors as process symptoms. Keep `cat /proc/<pid>/stack` - Warn that `ps -ww` command lines may include CLI-arg credentials and that `sample` stack frames may include in-memory secrets — redact before quoting in the report - Cover the "no sessions found at all" case so a fresh machine doesn't get reported as "all healthy" when zero data was collected 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
…y state triage - Update "What to look for" overview from `pgrep -lP <pid>` to `pgrep -P <pid>` to match step 3 (overview was left behind in the previous round when step 3 was upgraded to capture child state) - Add a triage sentence to step 3: when the state alone explains the problem (`T` = stopped, `Z` = zombie), skip child/log/stack inspection and go straight to the report 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
The actual priority in `Storage.getRuntimeBaseDir()` is `QWEN_RUNTIME_DIR` > `advanced.runtimeOutputDir` setting > `QWEN_HOME` > `~/.qwen`. The previous round merged the `advanced.runtimeOutputDir` mention but listed it after `QWEN_HOME`, and the shell snippet skipped the settings layer entirely — so on a machine where only the setting was configured, the skill would silently look in `~/.qwen` and miss all sessions. - Reorder the prose priority list to match the source - Add a `jq`-based read of `~/.qwen/settings.json` between the env-var and `QWEN_HOME`/default fallbacks. Gracefully degrades if `jq` is absent or the setting is unset. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Functional upgrades found in self-review (no reviewer raised these):
- Add network-hang detection bullet to step 3. Hung HTTPS requests to
the model API are the most common qwen-code "stuck" mode and showed
as healthy under all previous heuristics (low CPU + S state). macOS
uses `lsof -i -p`, Linux uses `ss -tnp`.
- Add a fast path at the top of "Investigation steps": when the user
passes a digit-only PID, skip enumeration and go straight to per-PID
ps + step 3. Avoids a full sidecar+ps scan in the targeted case.
- Replace per-file sidecar liveness check with a single bash loop that
emits only live (pid, sidecar-path) pairs. On machines with many
stale sidecars this drops 50+ separate reads.
- Promote `~/.qwen/debug/latest` to the primary debug-log entry point
(it usually points to the suspicious session). Sidecar-derived path
becomes the fallback.
- Bound the debug-log read with `tail -n 200` so the model doesn't
attempt to load multi-GB log files.
- Replace the placeholder `<child_pids>` for `ps -p` with a runnable
`pgrep -P <pid> | xargs -I{} ps -p {} -o ...` composition.
- Drop the redundant "substitute <pid> only after validation" note in
step 3 — the digit-only whitelist in Argument validation already
enforces this; PIDs from ps/sidecar are inherently digit-only.
End-to-end tmux smoke test confirms the flow runs to completion with a
correct structured report.
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Two issues caught by Codex review: 1. **PID fast path left $RUNTIME_DIR unset.** Step 3 references `"$RUNTIME_DIR"/debug/<session-id>.txt` but the fast path skipped step 1 where it was resolved, so debug-log lookup degraded to `/debug/latest` (broken absolute path). Fix: extract RUNTIME_DIR resolution into a preamble that runs before both paths. Also add a `grep -l "pid": <PID>` lookup in the fast path so it can match the given PID to its sidecar and recover the session ID for log lookup. 2. **Sidecar liveness loop required `jq`.** Default macOS / minimal Linux images don't ship `jq`, so the loop emitted nothing for every sidecar — the "preferred reliable" path silently failed and the skill fell back to the less accurate `ps | grep`. Replace with a single-spawn `node -e` script: node is guaranteed present (qwen-code itself runs on it). The settings.json `jq` lookup stays — that one gracefully degrades to QWEN_HOME/default if `jq` is missing. Both verified by hand: liveness loop correctly emits live PID/sidecar pairs (56219, 33534), `grep -l` lookup correctly finds the sidecar for a given PID and emits empty for non-matches. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Codex review caught that the targeted PID fast path accepted any digit-only PID and dumped its full command line, bypassing the Qwen- process filter that the general scan applies via `grep -E '(qwen|node.*qwen|bun.*qwen)'`. Cross-user PIDs are already filtered (`kill -0` returns EPERM), but **same-user non-Qwen processes** would have their argv (potentially including secret CLI flags) printed into the chat. Fix: add a single-line validation pipeline before the stats dump: `kill -0 <pid> && ps -p <pid> -o command= -ww | grep -qE '(qwen|node.*qwen|bun.*qwen)'`. If it returns non-zero, refuse with "PID is not a current-user Qwen Code session" and stop the diagnostic. Otherwise proceed. Verified by manual test against a real Qwen Code session PID (matches) and PID 1 / launchd (correctly rejected). 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Four issues from PR review:
1. **Settings path honors QWEN_HOME.** The `jq` lookup in the preamble
hardcoded `~/.qwen/settings.json`, but `getGlobalSettingsPath()`
resolves to `$QWEN_HOME/settings.json` when set. Now uses
`"${QWEN_HOME:-$HOME/.qwen}/settings.json"`.
2. **Sidecar grep uses `-El`.** Without `-E`, BSD `grep` on macOS may
not treat `\b` as a word boundary in BRE. Also added a note: when
PID reuse makes multiple sidecars match, prefer the most recently
modified file via `ls -t | head -n 1`.
3. **Process regex tightened to avoid false positives.** The old
`(qwen|node.*qwen|bun.*qwen)` matched any path containing "qwen"
anywhere — so `qwen-playground/server.js`, `qwen-polyfill.js`,
and even unrelated processes that pass a qwen-code path as `--cwd`
(e.g., Codex plugin brokers) all matched. Replaced with
`(qwen-code/[^ ]*\.(js|ts|mjs|cjs)( |$)|/qwen( |$))` — requires the
`qwen-code/` substring to be followed by a script-file path, OR
the bin invocation to end in `/qwen`. Verified on the local machine
that broker processes are no longer matched while real Qwen
sessions (worktree dev, dist/cli.js, qwen serve daemons) all are.
4. **lsof safety.** Added `-nP` to skip reverse-DNS and port lookups
which can themselves hang. Mentioned `timeout 10` / `gtimeout 10`
as an optional prefix when available — qwen-code's shell tool
already has a backstop timeout, so this is belt-and-suspenders.
Note: tested `\b` in BSD ERE on macOS — it does work correctly with
`-E`, so the `-El` switch alone fully addresses concern #2's
portability claim (BRE-without-E remains broken but is no longer used).
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
wenshao
left a comment
There was a problem hiding this comment.
[Critical] No test for the new stuck/SKILL.md file. YAML frontmatter syntax errors (e.g., malformed allowedTools, missing --- delimiter) would silently skip the skill at runtime with no CI failure. Add a test that reads the actual file through parseSkillContent and asserts name, description, allowedTools, and non-empty body.
[Suggestion] BundledSkillLoader.test.ts still uses hardcoded mock data and doesn't verify the new stuck skill appears in loaded commands. Add an assertion that bundled skills count >= 5 (batch, loop, qc-helper, review, stuck) or explicitly check for the stuck command.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
`Storage.resolvePath()` in qwen-code expands `~` and resolves relative
paths before using `advanced.runtimeOutputDir`. The shell preamble was
reading the raw JSON value via `jq`, so a user with
`"runtimeOutputDir": "~/.qwen-runtime"` would pass the literal string
to the glob — bash does not expand `~` inside double quotes — and the
sidecar scan would silently find nothing and fall back to ps-only mode.
Add two bash lines after the jq lookup:
- `${RUNTIME_DIR/#\~/$HOME}` to substitute leading tilde
- `case ... cd && pwd` to resolve relative paths to absolute (clears
RUNTIME_DIR if cd fails so the chain falls through to QWEN_HOME)
Smoke tested: tilde paths expand, absolute paths pass through, relative
paths resolve, nonexistent dirs clear cleanly, empty stays empty.
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Adopted 9 of the 16 review suggestions; declined 5; 1 already done.
- Anchor process regex to `(^|/)qwen-code[^ /]*/`. Now matches renamed
clones (`qwen-code-dev`, `qwen-code-x1`, worktrees) AND rejects
prefix false positives (`analyze-qwen-code/`, `my-qwen-code-tool/`).
Verified against 10 cases.
- Clarify RSS unit conversion: KB ÷ 1024 = MB, KB ÷ 1048576 = GB. The
4GB threshold is `4194304` KB raw, or 4 in GB. Prevents the model
from dividing once and comparing to 4, which would over-alert by
1024×.
- Add `State S with low CPU` to the Signs list so initial triage flags
the most common hang signature (hung HTTPS to model API) instead of
only catching it inside step 3.
- Split fast path validation into two guards with distinct messages:
dead/wrong-user vs. yours-but-not-Qwen. Plus add the same
credential-redaction note that step 2 already has.
- Replace `pgrep | xargs -I{} ps` with a single `ps -p $CHILDREN`
call (avoids forking N times) and add `-ww` so long child cmdlines
don't truncate.
- Wrap macOS `sample <pid> 3` with optional `timeout 15` (or
`gtimeout 15`). Same belt-and-suspenders pattern used for `lsof`.
- Note that `ss -tnp -p` requires root/CAP_NET_ADMIN; non-root sees
`-` in the PID column. Tell the model to fall back to `lsof` instead
of concluding "no connections".
Declined: self-PID via `$$` (wrong PID — `$$` is the spawned shell,
not qwen), pgrep fallback for distroless (over-engineering), `\b`
matches negative numbers (false alarm — `:[[:space:]]*` won't match
through `-`), regex DRY abstraction (no value in markdown prompts),
project-level settings.json read (already declined; same trade-off).
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
wenshao
left a comment
There was a problem hiding this comment.
本地拉 42c3ca3b 验证:
- 文件解析:用
parseSkillContent跑出name=stuck、description非空、allowedTools=["run_shell_command","read_file"]、argumentHint="'[PID or symptom]'"、body 11.7KB ✅ - 真实 E2E(DeepSeek 后端 +
-y):/stuckslash command 触发模型执行ps/runtime.json扫描,正确识别当前进程是 /stuck 自身,输出结构化诊断报告 ✅ - 11:03Z Suggestions:RUNTIME_DIR 项目级 fallback、regex anchoring、fast-path 错误信息分级、
ss非 root 回落lsof,全部已在d3f4bfb8/2d987d93/42c3ca3b采纳 ✅
但 10:22Z Critical(缺测试)这条仍未处理,14 个 commit 的 diff 里没有任何 *.test.ts 改动。作者的 commit msg 列了 9 adopted / 5 declined,没明说为什么 skip 测试。
为什么这条 Critical 必须修
skill-manager.ts:990 的错误处理对 bundled skill 的 YAML/字段错误是静默 catch + debug log:
} catch (error) {
if (error instanceof SkillError) {
debugLogger.error(`Failed to parse skill at ${skillDir}: ${error.message}`);
} else {
debugLogger.debug(`No valid SKILL.md found in ${skillDir}, skipping`);
}
return null;
}后续 commit 改 SKILL.md 时打错一个 frontmatter 字段(例如 allowedTools 写成 string、description 漏掉、--- delimiter 丢一个),CI 不会失败;只有用户敲 /stuck 时 autocomplete 里没这个 skill 才会发现——而那时已经 release 了。
建议改法(最小补丁)
packages/core/src/skills/skill-load.test.ts 末尾加一个 describe block,参数化覆盖全部 5 个 bundled skill(解决系统性问题,不是只补 stuck 一个):
describe('bundled SKILL.md files parse without errors', () => {
const bundledDir = path.join(__dirname, 'bundled');
const dirs = fs.readdirSync(bundledDir, { withFileTypes: true })
.filter(d => d.isDirectory())
.map(d => d.name);
it.each(dirs)('parses %s/SKILL.md and has required fields', (name) => {
const file = path.join(bundledDir, name, 'SKILL.md');
const content = fs.readFileSync(file, 'utf8');
const cfg = parseSkillContent(content, file);
expect(cfg.name).toBeTruthy();
expect(cfg.description).toBeTruthy();
expect(cfg.body.length).toBeGreaterThan(0);
if (cfg.allowedTools) expect(Array.isArray(cfg.allowedTools)).toBe(true);
});
});8 行(it.each + 4 assertion),跟现有 parseSkillContent 测试同文件,覆盖 stuck + batch/loop/qc-helper/review 5 个。下一次有人改 bundled skill 打错 frontmatter,CI 立刻红。
补上这一条测试就 approve。剩下 BundledSkillLoader.test.ts 那条 Suggestion 可以省(上面这条已经从源头覆盖了解析正确性,loader 测试是 mock 数据,作者维持现状也行)。
The bundled skill loader (`SkillManager.parseSkillFileInternal`) silently catches and debug-logs frontmatter parse errors, so a typo in any SKILL.md (missing `description`, broken `---` delimiter, `allowedTools` written as a scalar) merges with green CI and only surfaces when a user invokes the skill — at which point the skill is missing from autocomplete with no indication why. Add a tiny integration test that walks `packages/core/src/skills/bundled/`, runs every `SKILL.md` through the real `parseSkillContent` (no mocks), and asserts: name matches the directory, description is non-empty, body is non-empty, and `allowedTools` (if present) is an array. Lives in its own file because `skill-load.test.ts` mocks `fs/promises` and the YAML parser, which would defeat the purpose of an integration test. New file uses real fs and the real loader. Negative-case verified: deliberately corrupting `stuck/SKILL.md`'s frontmatter delimiter makes only that file's test fail; restoring it returns the suite to all-green. Addresses wenshao's standing [Critical] review (2026-05-15 12:29Z) about the bundled skill system lacking automated tests for SKILL.md parsing. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
|
@wenshao 已采纳上一轮 12:29Z 的 [Critical]——commit 新增
现在 5 个 bundled skill(batch / loop / qc-helper / review / stuck)每个都有独立 it.each case。负向验证:故意把 stuck/SKILL.md 的 下次有人改 bundled SKILL.md 打错 frontmatter,CI 会立刻红,不会到 release 才暴露。 之所以单独建文件而不是加到 |
wenshao
left a comment
There was a problem hiding this comment.
本地拉 161fa70f 验证:
- Critical(缺 parse 测试)已修:新增
bundled-skills.integration.test.ts48 行,it.each参数化覆盖全部 bundled skill(batch/loop/qc-helper/review/stuck),断言cfg.name === directory_name、description非空、body非空、allowedTools是数组(如有),加 "at least one" sanity check 防 glob 失败。下次任何人改坏 frontmatter,CI 立刻红。 - 本地结果:
bundled-skills.integration.test.ts6/6 过;vitest run290 个 test files 全过;Lint + CodeQL 绿。 - 真实 E2E(上一轮已验):
/stuckslash command 在 DeepSeek 后端跑通,扫描进程、读 sidecar、输出结构化诊断报告,10 步流程无中断。
LGTM ✅
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Summary
/stuckbundled skill that diagnoses frozen, stuck, or slow Qwen Code sessions~/.qwen/debug/logs and supports macOSsample+ Linux/proc/stackfor stack dumpsPorted from claude-code's internal
/stuckskill with adaptations:commandcolumn path matching (Node.js CLI shows asnodeincomm, notqwen)~/.qwen/debug/withlatestsymlink supportTest plan
/stuckappears in autocomplete with correct description and argument hint/stucktriggers process scan viaps, identifies sessions, checks child processes, and outputs structured diagnostic reportpgrep, and concludes "all healthy"人工验证
🤖 Generated with Qwen Code