feat(skills): add tmux-real-user-testing skill for readable TUI test logs by BingqingLyu · Pull Request #112 · BingqingLyu/qwen-code

BingqingLyu · 2026-04-27T07:02:07Z

Summary

Add a generic tmux-real-user-testing skill that uses tmux as a "real user recording harness" — drive the TUI with keyboard actions and capture-pane at each step to produce a readable, reviewable interaction timeline log.

What it does

Helper script (tmux-real-user-log.sh): start/snapshot/send/type-submit/wait-for/finish commands with session-collision detection and eval-friendly output
SKILL.md: generic scene-design methodology + manual workflow guide, no scenario-specific hardcoding
Key design: capture-pane snapshots after each action → tmux-readable-full.log as the primary artifact (not raw ANSI pipe logs)

Why

The existing E2E framework focuses on structured assertions. This skill complements it with narrative-style TUI testing where the log itself is the report — useful for regression testing, UX review, and onboarding documentation.

Changes

.qwen/skills/tmux-real-user-testing/SKILL.md — skill definition with workflow guide
.qwen/skills/tmux-real-user-testing/scripts/tmux-real-user-log.sh — helper shell script

…#3559) params.pages !== undefined let "" fall through to parsePDFPageRange(''), which returns null and surfaced "Invalid pages parameter: ''" for every read_file call from models that default optional strings to "". Switch to a truthy check so "" behaves the same as an omitted field, and add a regression test. Fixes QwenLM#3558

…QwenLM#3540) * feat(session): auto-title sessions via fast model, add /rename --auto The /rename work in QwenLM#3093 generates kebab-case titles only when the user explicitly runs `/rename` with no args; until they do, the session picker shows the first user prompt (often truncated or misleading). This change adds a sentence-case auto-title that fires once per session after the first assistant turn, using the configured fast model. New service: `packages/core/src/services/sessionTitle.ts` — `tryGenerateSessionTitle(config, signal)` returns a discriminated outcome (`{ok: true, title, modelUsed}` | `{ok: false, reason}`) so callers can either handle failures generically or map reasons to actionable messages. Prompt shape: 3-7 words, sentence case, good/bad examples including a CJK row, JSON schema enforced via `baseLlmClient.generateJson`. `maxAttempts: 1` — titles are cosmetic metadata and shouldn't fight rate limits. Trigger point: `ChatRecordingService.maybeTriggerAutoTitle` runs after `recordAssistantTurn`. Fire-and-forget promise, guarded by: - `currentCustomTitle` — don't overwrite any existing title. - `autoTitleController` doubles as in-flight flag; a second turn while the first is still pending is a no-op. - `autoTitleAttempts` cap of 3 — the first assistant turn may be a pure tool-call with no user-visible text; retry for a handful of turns until a title lands. Cap bounds total waste. - `!config.isInteractive()` — headless CLI (`qwen -p`, CI) never auto- titles; spending fast-model tokens on a one-shot session is waste. - `autoTitleDisabledByEnv()` — `QWEN_DISABLE_AUTO_TITLE=1` opt-out. - `config.getFastModel()` falsy — skip entirely rather than falling back to the main model; auto-titling on main-model tokens is too expensive to be silent. Persistence: `CustomTitleRecordPayload` grows a `titleSource: 'auto' | 'manual'` field. Absent on pre-change records (treated as `undefined` → manual, safe default so a user's pre-upgrade `/rename` is never silently reclassified). `SessionPicker` renders `titleSource === 'auto'` titles in dim (secondary) color; manual stays full contrast. On resume, the persisted source is rehydrated into `currentTitleSource` — without this, finalize's re-append would rewrite an auto title as manual on every resume cycle. Cross-process manual-rename guard: when two CLI tabs target the same JSONL, in-memory state can diverge. Before writing an auto record, the IIFE re-reads the file via `sessionService.getSessionTitleInfo`. If a `/rename` from another process landed as manual, bail and sync local state — never clobber a deliberately-chosen manual title with a model guess. Cost is one 64KB tail read per successful generation. `finalize()` aborts the in-flight controller before re-appending the title record. Session switch / shutdown doesn't have to wait on a slow fast-model call. New user-facing command: `/rename --auto` regenerates via the same generator — explicit user trigger, overwrites whatever's there (manual or auto) because the user asked. Errors route through `autoFailureMessage(reason)` so `empty_history`, `model_error`, `aborted`, etc. each get actionable guidance rather than a generic "could not generate". `/rename -- --literal-name` is the sentinel for titles that start with `--`; unknown `--flag` tokens error with a hint pointing at the sentinel. Existing `/rename <name>` and bare `/rename` (kebab-case via existing path) are unchanged, except the kebab path now prefers fast model when available and runs its output through `stripTerminalControlSequences` (same ANSI/OSC-8 hardening as the sentence-case path). New shared util: `packages/core/src/utils/terminalSafe.ts` — `stripTerminalControlSequences(s)` strips OSC (\x1b]...\x07|\x1b\\), CSI (\x1b[...[a-zA-Z]), SS2/SS3 leaders, and C0/C1/DEL as a backstop. A model-returned `\x1b[2J` or OSC-8 hyperlink escape would otherwise execute on every SessionPicker render; both sentence-case and kebab paths now route titles through the helper before they reach the JSONL or the UI. Tail-read extractor: `extractLastJsonStringFields(text, primaryKey, otherKeys, lineContains)` reads multiple fields from the same matching line in a single pass. Two separate tail scans could return a mismatched pair (primary from a newer record, secondary from an older one with only the primary set); the new helper guarantees the pair is atomic. Validates a proper closing quote on the primary value so a crash-truncated trailing record can't win the latest-match race. `readLastJsonStringFieldsSync` is its file-reading wrapper — same tail-window fast path and full-file fallback as the single-field version, plus a `MAX_FULL_SCAN_BYTES = 64MB` cap so a corrupt multi-GB session file can't freeze the picker. Session reads now open with `O_NOFOLLOW` (falls back to plain RDONLY on Windows where the constant isn't exposed) — defense in depth against a symlink planted in `~/.qwen/projects/<proj>/chats/`. Character handling: `flattenToTail` on the LLM prompt drops a dangling low surrogate after `slice(-1000)` — otherwise a CJK supplementary char or emoji cut mid-pair produces invalid UTF-16 that some providers 400. `sanitizeTitle` applies the same surrogate scrub after max-length trim, and strips paired CJK brackets (`「」『』【】〈〉《》`) as whole units so a `【Draft】 Fix login` doesn't leave a dangling `】` after leading-char strip. `lineContains` in the title reader is tightened from the loose substring `'custom_title'` to `'"subtype":"custom_title"'` so user text containing the literal `custom_title` can't shadow a real record. Tests: 46 new unit tests across - `sessionTitle.test.ts` (22): success/all-failure-reasons, tool-call filter, tail-slice, surrogate scrub, ANSI/OSC-8 strip, CJK brackets. - `chatRecordingService.autoTitle.test.ts` (15): trigger/skip matrix, in-flight guard, abort propagation on finalize, manual/auto/legacy resume symmetry, cross-process race, env opt-out, retry-after- transient. - `sessionStorageUtils.test.ts` (13): single-pass extractor, straddle boundary, truncated trailing record, lineContains, multi-field atom. - `renameCommand.test.ts` (8): `--auto` success, all reasons, sentinel, unknown-flag hint, positional rejection, manual/SessionService fallbacks. * docs(session): design doc for auto session titles Matches the session-recap design doc shape (Overview / Triggers / Architecture / Prompt Design / History Filtering / Persistence / Concurrency / Configuration / Observability / Out of Scope) and adds a Security Hardening section unique to the title path — titles render directly in the picker and persist in user-readable JSONL, so LLM-returned control sequences are an attack surface the recap path doesn't have. Captures decisions a code-only reader has to reverse-engineer: - Why `maxAttempts: 1` (best-effort cosmetic metadata; no retry loop). - Why `autoTitleAttempts` cap is 3 (first turn can be pure tool-call). - Why the auto trigger does NOT fall back to the main model but session-recap does (auto-title fires on every turn; silently charging main-model tokens is a bill surprise). - Why `titleSource: undefined` stays unwritten on legacy records (no rewrite risks silently reclassifying user intent). - Why the cross-process re-read sits between the LLM await and the append (manual wins at both in-process and on-disk layers). - Why `finalize()`'s abort tolerates a controller swap (in-flight identity check). - Why JSON-schema function calling instead of tag extraction (avoid reasoning preamble bleed; cross-provider reliability). Placed at docs/design/session-title/ alongside session-recap, compact-mode, fork-subagent, and other per-feature design docs. No sidebar index update required — the design folder is unindexed. * test(rename): pin model choice in bare /rename kebab path Addresses reviewer feedback: the bare `/rename` model selection (`config.getFastModel() ?? config.getModel()`) had no test pinning it either way. Previous tests mocked `getHistory: []`, which exits the function before the model is ever chosen, so a silent regression to either direction (always-main or always-fast) would pass CI. Two explicit cases now: - fastModel set → `generateContent` called with `model: 'qwen-turbo'`. - fastModel unset → `generateContent` called with `model: 'main-model'`. The tests intentionally mock a non-empty history so the kebab path reaches the generateContent call site instead of bailing on empty input.

* fix(i18n): sync mismatched keys between en.js and zh.js (QwenLM#3503) Add 4 keys missing from en.js that are actively used in source code, add 5 missing Chinese translations to zh.js, integrate check-i18n into CI to prevent future drift, and skip JSON file write in CI to avoid dirtying the working tree. --- Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

…M#3509) * fix(cli): remove residual blank lines after MCP init completes (QwenLM#3095) ConfigInitDisplay rendered <Box marginTop={1}> plus a content line, so the live area grew by 2 rows during startup. When initialization finished and the component unmounted, Ink shrank the live area but the rows it had already committed to the terminal scrollback cannot be reclaimed, leaving a visible gap above the input. Move the MCP init status into the Footer's left-bottom status slot (always mounted, fixed height) so the live area height stays constant across the init → ready transition. The status participates in the existing priority chain: ctrlC / ctrlD / escape / vim / shell / autoAccept / configInit / hint. * fix(cli): suppress MCP init message when custom status line is active Audit follow-up. Previously the configInit branch preceded the suppressHint branch in the footer's left-bottom priority chain. With a custom status line configured, <Text>{null}</Text> collapses to zero rows in Ink, so the footer's bottom row went from 1 row during init to 0 rows after — a 1-row height oscillation that reintroduces the same scrollback-residue symptom the original fix eliminated in the default case. Swap the order so suppressHint short-circuits to null first: the init message now shares the hint's suppression rule, keeping the footer's height constant in every configuration. Also: - Gate the hook's return on isConfigInitialized directly instead of letting the effect clear state, avoiding a one-frame flash where the stale "Initializing..." message leaks through on the first render after init completes. - Cover the new behavior with three Footer tests, including a regression test for the custom-status-line case. * fix(cli): show MCP init progress even under a custom status line Reverting a UX trade-off introduced in the previous commit. That change suppressed the init message whenever a custom status line was active, arguing that <Text>{null}</Text> collapses to zero rows in Ink and any non-zero init row would re-create a one-row shrink on completion. Zero shrink was the wrong goal. Hiding init progress from users who have configured a status line is a real usability loss — the status line does not surface MCP connection state, so those users now see no feedback during startup. A one-time, one-line shrink on init completion is a far smaller regression than the original two-row scrollback residue this PR was created to fix, and strictly better than the silent alternative. Keep the init message in the left-bottom slot and let it sit above suppressHint in the priority chain. Update the regression test so that it pins the new behavior (init is visible with or without a status line) and prevents the suppression from being reintroduced. * fix(cli): keep MCP init progress visible in screen-reader mode Footer is gated behind !isScreenReaderEnabled, so moving the init message inside Footer silenced it for screen-reader users. Render the same message as a plain Text node in Composer when the screen reader is active — screen-reader users don't suffer from the live-area residual row issue that motivated the original move, so an independent node is safe for them. * refactor(cli): drop duplicated screen-reader init path and show progress under YOLO - ScreenReaderAppLayout already mounts <Footer /> directly, so the separate <Text> branch in Composer was producing a duplicated 'Connecting to MCP servers...' line in screen-reader mode. Remove it. - Move configInitMessage ahead of AutoAcceptIndicator in the footer's priority chain so users launched with YOLO / auto-accept-edits still see the ~1s startup progress; the approval-mode indicator takes over as soon as init finishes. - Add unit tests for useConfigInitMessage covering the idle, progress, reset, and unsubscribe paths.

Co-authored-by: lawrence3699 <lawrence3699@users.noreply.github.com>

…ased approach (QwenLM#3502) * feat(web-search): add GLM (ZhipuAI) web search provider - Add GlmProvider class implementing BaseWebSearchProvider using the ZhipuAI Web Search API (https://open.bigmodel.cn/api/paas/v4/web_search) - Support multiple search engines: search_std, search_pro, search_pro_sogou, search_pro_quark - Support optional config: maxResults, searchIntent, searchRecencyFilter, contentSize, searchDomainFilter - Truncate query to 70 characters per API limit - Register 'glm' in the provider discriminated union (types.ts) and createProvider() switch (index.ts) - Add GlmProviderConfig to settingsSchema, ConfigParams, and Config class - Add --glm-api-key CLI flag and GLM_API_KEY env var support in webSearch.ts - Forward GLM_API_KEY in sandbox environment - Update provider priority list: Tavily > Google > GLM > DashScope - Add 17 unit tests for GlmProvider and 4 integration tests in index.test.ts - Update docs/developers/tools/web-search.md with GLM configuration, env vars, CLI args, pricing, and corrected DashScope billing info - Fix stale OAuth/free-tier references in web-search.md Closes QwenLM#3496 * docs(web-search): fix DashScope note and add GLM server-side limitations * fix(web-search): make DashScope provider work with standard API key, remove qwen-oauth dependency - DashScopeProvider.isAvailable() now checks config.apiKey instead of authType - Remove OAuth credential file reading and resource_url requirement - Use standard DashScope endpoint: dashscope.aliyuncs.com/api/v1/indices/plugin/web_search - Read DASHSCOPE_API_KEY env var and --dashscope-api-key CLI flag - Forward DASHSCOPE_API_KEY into sandbox environment - Update integration test to detect DASHSCOPE_API_KEY - Update docs to reflect new API key based configuration * feat(web-search): remove built-in web search tool The web_search tool and all related provider implementations are removed. Web search functionality will be provided via MCP integrations instead, which is the direction the broader agent ecosystem is moving. Removed: - packages/core/src/tools/web-search/ (entire directory) - packages/cli/src/config/webSearch.ts - integration-tests/cli/web_search.test.ts - ToolNames.WEB_SEARCH, ToolErrorCode.WEB_SEARCH_FAILED - webSearch config in ConfigParams, Config class, settingsSchema - CLI options: --tavily-api-key, --google-api-key, --google-search-engine-id, --glm-api-key, --dashscope-api-key, --web-search-default - Sandbox env forwarding for TAVILY/GLM/DASHSCOPE/GOOGLE search keys - web_search from rule-parser, permission-manager, speculation gate, microcompact tool set, and builtin-agents tool list * fix: remove websearch reference * docs: remove websearch tool * docs: add break change guide * fix review

…logs Add a generic skill that uses tmux as a Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>"real user recording harness": drive the TUI with keyboard actions and capture-pane at each step to produce a readable, reviewable interaction timeline log. - General-purpose design: no scenario-specific hardcoding - Helper script with session-collision detection and eval-friendly output - wait-for polling instead of blind sleeps, timeout dumps current pane - Manual workflow guide with scene-design methodology

BingqingLyu · 2026-05-07T07:12:54Z

Conflict Group 1

This PR shares modified functions with 23 other PR(s): #10, #106, #113, #114, #117, #14, #17, #18, #2, #21, #22, #26, #31, #36, #42, #46, #7, #71, #75, #86, #88, #94, #96.

These PRs should be reviewed as a batch — merging one may affect the others.

Function	File	Also modified by
`SessionListItemView`	`SessionPicker.tsx`	#113, #114, #117, #88
`buildPermissionRules`	`rule-parser.ts`	#113, #114, #117, #88
`createStreamingInputWithControlPoint`	`permission-control.test.ts`	#113, #114, #117, #71, #88
`createToolRegistry`	`config.ts`	#113, #114, #117, #88
`extractToolNameFromRecord`	`export-html-from-chatrecord-jsonl.js`	#113, #114, #117, #88
`finalize`	`chatRecordingService.ts`	#113, #114, #117, #88
`findSessionsByTitle`	`sessionService.ts`	#113, #114, #117, #88
`generateSessionTitle`	`renameCommand.ts`	#113, #114, #117, #88
`getToolCallComponent`	`ChatViewer.tsx`	#113, #114, #117, #42, #88
`handleQwenAuth`	`handler.ts`	#113, #114, #117, #36, #88
`isBrowserLaunchSuppressed`	`config.ts`	#113, #114, #117, #88
`isSdkMcpServerConfig`	`config.ts`	#106, #113, #114, #117, #18, #46, #75, #86, #88
`isValidSessionId`	`config.ts`	#113, #114, #117, #88
`listSessions`	`sessionService.ts`	#113, #114, #117, #88
`loadCliConfig`	`config.ts`	#106, #113, #114, #117, #36, #46, #75, #86, #88
`main`	`check-i18n.ts`	#113, #114, #117, #2, #88
`makeConfig`	`permission-manager.test.ts`	#113, #114, #117, #88
`normalizeConfigOutputFormat`	`config.ts`	#106, #113, #114, #117, #18, #75, #86, #88
`parseApprovalModeValue`	`config.ts`	#10, #113, #114, #117, #21, #22, #36, #46, #86, #88
`parseArguments`	`config.ts`	#10, #113, #114, #117, #14, #17, #18, #21, #22, #31, #36, #46, #7, #86, #88
`parseRules`	`rule-parser.ts`	#113, #114, #117, #88
`readLastJsonStringFieldSync`	`sessionStorageUtils.ts`	#113, #114, #117, #88
`recordSlashCommand`	`chatRecordingService.ts`	#113, #114, #117, #88
`recordUiTelemetryEvent`	`chatRecordingService.ts`	#113, #114, #117, #88
`renameSession`	`sessionService.ts`	#113, #114, #117, #88
`resolveDefaultPermission`	`permission-manager.ts`	#113, #114, #117, #88
`start_sandbox`	`sandbox.ts`	#10, #113, #114, #117, #26, #7, #88, #94
`toPosixPath`	`rule-parser.ts`	#113, #114, #117, #88
`useDreamRunning`	`Footer.tsx`	#113, #114, #117, #86, #88, #96
`validateToolParamValues`	`read-file.ts`	#113, #114, #117, #88

Posted by codegraph-ai conflict detection.

zhangxy-zju and others added 7 commits April 23, 2026 20:02

fix(sdk-java): pass custom env to CLI process (QwenLM#3543)

3e74a33

Co-authored-by: lawrence3699 <lawrence3699@users.noreply.github.com>

BingqingLyu added conflicting-group-1 labels Apr 28, 2026

This was referenced Apr 28, 2026

rename openai var to qwen code var for clarity #7

Open

feat(cli): add Traditional Chinese (zh-TW) as a UI language option #2

Draft

BingqingLyu removed conflicting-group-1 labels Apr 29, 2026

Repository owner deleted a comment from github-actions Bot Apr 29, 2026

BingqingLyu added conflicting-group-1 and removed conflicting-group-1 labels Apr 29, 2026

BingqingLyu added conflicting-group-1 conflicting-group-1 Conflicting PR group 1 — review as a batch conflicting-pr Shares at least one cross-PR dependency with other PRs and removed conflicting-group-1 labels May 7, 2026

BingqingLyu mentioned this pull request May 7, 2026

fix(core): preserve reasoning_content when merging consecutive assistant messages (#3619) #126

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skills): add tmux-real-user-testing skill for readable TUI test logs#112

feat(skills): add tmux-real-user-testing skill for readable TUI test logs#112
BingqingLyu wants to merge 7 commits into
mainfrom
pr-3577-feat-tmux-real-user-testing-skill

BingqingLyu commented Apr 27, 2026 •

edited

Loading

Uh oh!

BingqingLyu commented May 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

BingqingLyu commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What it does

Why

Changes

Uh oh!

BingqingLyu commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Conflict Group 1

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

BingqingLyu commented Apr 27, 2026 •

edited

Loading

BingqingLyu commented May 7, 2026 •

edited

Loading