Skip to content

feat(skills): add tmux-real-user-testing skill for readable TUI test logs#112

Open
BingqingLyu wants to merge 7 commits into
mainfrom
pr-3577-feat-tmux-real-user-testing-skill
Open

feat(skills): add tmux-real-user-testing skill for readable TUI test logs#112
BingqingLyu wants to merge 7 commits into
mainfrom
pr-3577-feat-tmux-real-user-testing-skill

Conversation

@BingqingLyu

@BingqingLyu BingqingLyu commented Apr 27, 2026

Copy link
Copy Markdown
Owner

Summary

Add a generic tmux-real-user-testing skill that uses tmux as a "real user recording harness" — drive the TUI with keyboard actions and capture-pane at each step to produce a readable, reviewable interaction timeline log.

What it does

  • Helper script (tmux-real-user-log.sh): start/snapshot/send/type-submit/wait-for/finish commands with session-collision detection and eval-friendly output
  • SKILL.md: generic scene-design methodology + manual workflow guide, no scenario-specific hardcoding
  • Key design: capture-pane snapshots after each action → tmux-readable-full.log as the primary artifact (not raw ANSI pipe logs)

Why

The existing E2E framework focuses on structured assertions. This skill complements it with narrative-style TUI testing where the log itself is the report — useful for regression testing, UX review, and onboarding documentation.

Changes

  • .qwen/skills/tmux-real-user-testing/SKILL.md — skill definition with workflow guide
  • .qwen/skills/tmux-real-user-testing/scripts/tmux-real-user-log.sh — helper shell script

zhangxy-zju and others added 7 commits April 23, 2026 20:02
…#3559)

params.pages !== undefined let "" fall through to parsePDFPageRange(''),
which returns null and surfaced "Invalid pages parameter: ''" for every
read_file call from models that default optional strings to "".

Switch to a truthy check so "" behaves the same as an omitted field, and
add a regression test.

Fixes QwenLM#3558
…QwenLM#3540)

* feat(session): auto-title sessions via fast model, add /rename --auto

The /rename work in QwenLM#3093 generates kebab-case titles only when the user
explicitly runs `/rename` with no args; until they do, the session picker
shows the first user prompt (often truncated or misleading). This change
adds a sentence-case auto-title that fires once per session after the
first assistant turn, using the configured fast model.

New service: `packages/core/src/services/sessionTitle.ts` —
`tryGenerateSessionTitle(config, signal)` returns a discriminated outcome
(`{ok: true, title, modelUsed}` | `{ok: false, reason}`) so callers can
either handle failures generically or map reasons to actionable messages.
Prompt shape: 3-7 words, sentence case, good/bad examples including a
CJK row, JSON schema enforced via `baseLlmClient.generateJson`.
`maxAttempts: 1` — titles are cosmetic metadata and shouldn't fight
rate limits.

Trigger point: `ChatRecordingService.maybeTriggerAutoTitle` runs after
`recordAssistantTurn`. Fire-and-forget promise, guarded by:

- `currentCustomTitle` — don't overwrite any existing title.
- `autoTitleController` doubles as in-flight flag; a second turn while
  the first is still pending is a no-op.
- `autoTitleAttempts` cap of 3 — the first assistant turn may be a
  pure tool-call with no user-visible text; retry for a handful of
  turns until a title lands. Cap bounds total waste.
- `!config.isInteractive()` — headless CLI (`qwen -p`, CI) never auto-
  titles; spending fast-model tokens on a one-shot session is waste.
- `autoTitleDisabledByEnv()` — `QWEN_DISABLE_AUTO_TITLE=1` opt-out.
- `config.getFastModel()` falsy — skip entirely rather than falling
  back to the main model; auto-titling on main-model tokens is too
  expensive to be silent.

Persistence: `CustomTitleRecordPayload` grows a `titleSource: 'auto' |
'manual'` field. Absent on pre-change records (treated as `undefined`
→ manual, safe default so a user's pre-upgrade `/rename` is never
silently reclassified). `SessionPicker` renders `titleSource === 'auto'`
titles in dim (secondary) color; manual stays full contrast. On resume,
the persisted source is rehydrated into `currentTitleSource` — without
this, finalize's re-append would rewrite an auto title as manual on
every resume cycle.

Cross-process manual-rename guard: when two CLI tabs target the same
JSONL, in-memory state can diverge. Before writing an auto record, the
IIFE re-reads the file via `sessionService.getSessionTitleInfo`. If a
`/rename` from another process landed as manual, bail and sync local
state — never clobber a deliberately-chosen manual title with a model
guess. Cost is one 64KB tail read per successful generation.

`finalize()` aborts the in-flight controller before re-appending the
title record. Session switch / shutdown doesn't have to wait on a slow
fast-model call.

New user-facing command: `/rename --auto` regenerates via the same
generator — explicit user trigger, overwrites whatever's there (manual
or auto) because the user asked. Errors route through
`autoFailureMessage(reason)` so `empty_history`, `model_error`,
`aborted`, etc. each get actionable guidance rather than a generic
"could not generate". `/rename -- --literal-name` is the sentinel for
titles that start with `--`; unknown `--flag` tokens error with a hint
pointing at the sentinel. Existing `/rename <name>` and bare `/rename`
(kebab-case via existing path) are unchanged, except the kebab path now
prefers fast model when available and runs its output through
`stripTerminalControlSequences` (same ANSI/OSC-8 hardening as the
sentence-case path).

New shared util: `packages/core/src/utils/terminalSafe.ts` —
`stripTerminalControlSequences(s)` strips OSC (\x1b]...\x07|\x1b\\), CSI
(\x1b[...[a-zA-Z]), SS2/SS3 leaders, and C0/C1/DEL as a backstop. A
model-returned `\x1b[2J` or OSC-8 hyperlink escape would otherwise
execute on every SessionPicker render; both sentence-case and kebab
paths now route titles through the helper before they reach the JSONL
or the UI.

Tail-read extractor: `extractLastJsonStringFields(text, primaryKey,
otherKeys, lineContains)` reads multiple fields from the same matching
line in a single pass. Two separate tail scans could return a mismatched
pair (primary from a newer record, secondary from an older one with only
the primary set); the new helper guarantees the pair is atomic. Validates
a proper closing quote on the primary value so a crash-truncated trailing
record can't win the latest-match race. `readLastJsonStringFieldsSync`
is its file-reading wrapper — same tail-window fast path and full-file
fallback as the single-field version, plus a `MAX_FULL_SCAN_BYTES = 64MB`
cap so a corrupt multi-GB session file can't freeze the picker. Session
reads now open with `O_NOFOLLOW` (falls back to plain RDONLY on Windows
where the constant isn't exposed) — defense in depth against a symlink
planted in `~/.qwen/projects/<proj>/chats/`.

Character handling: `flattenToTail` on the LLM prompt drops a dangling
low surrogate after `slice(-1000)` — otherwise a CJK supplementary char
or emoji cut mid-pair produces invalid UTF-16 that some providers 400.
`sanitizeTitle` applies the same surrogate scrub after max-length trim,
and strips paired CJK brackets (`「」 『』 【】 〈〉 《》`) as whole units so
a `【Draft】 Fix login` doesn't leave a dangling `】` after leading-char
strip. `lineContains` in the title reader is tightened from the loose
substring `'custom_title'` to `'"subtype":"custom_title"'` so user text
containing the literal `custom_title` can't shadow a real record.

Tests: 46 new unit tests across
- `sessionTitle.test.ts` (22): success/all-failure-reasons, tool-call
  filter, tail-slice, surrogate scrub, ANSI/OSC-8 strip, CJK brackets.
- `chatRecordingService.autoTitle.test.ts` (15): trigger/skip matrix,
  in-flight guard, abort propagation on finalize, manual/auto/legacy
  resume symmetry, cross-process race, env opt-out, retry-after-
  transient.
- `sessionStorageUtils.test.ts` (13): single-pass extractor, straddle
  boundary, truncated trailing record, lineContains, multi-field atom.
- `renameCommand.test.ts` (8): `--auto` success, all reasons, sentinel,
  unknown-flag hint, positional rejection, manual/SessionService
  fallbacks.

* docs(session): design doc for auto session titles

Matches the session-recap design doc shape (Overview / Triggers /
Architecture / Prompt Design / History Filtering / Persistence /
Concurrency / Configuration / Observability / Out of Scope) and adds a
Security Hardening section unique to the title path — titles render
directly in the picker and persist in user-readable JSONL, so
LLM-returned control sequences are an attack surface the recap path
doesn't have.

Captures decisions a code-only reader has to reverse-engineer:

- Why `maxAttempts: 1` (best-effort cosmetic metadata; no retry loop).
- Why `autoTitleAttempts` cap is 3 (first turn can be pure tool-call).
- Why the auto trigger does NOT fall back to the main model but
  session-recap does (auto-title fires on every turn; silently charging
  main-model tokens is a bill surprise).
- Why `titleSource: undefined` stays unwritten on legacy records (no
  rewrite risks silently reclassifying user intent).
- Why the cross-process re-read sits between the LLM await and the
  append (manual wins at both in-process and on-disk layers).
- Why `finalize()`'s abort tolerates a controller swap (in-flight
  identity check).
- Why JSON-schema function calling instead of tag extraction (avoid
  reasoning preamble bleed; cross-provider reliability).

Placed at docs/design/session-title/ alongside session-recap,
compact-mode, fork-subagent, and other per-feature design docs. No
sidebar index update required — the design folder is unindexed.

* test(rename): pin model choice in bare /rename kebab path

Addresses reviewer feedback: the bare `/rename` model selection
(`config.getFastModel() ?? config.getModel()`) had no test pinning
it either way. Previous tests mocked `getHistory: []`, which exits
the function before the model is ever chosen, so a silent regression
to either direction (always-main or always-fast) would pass CI.

Two explicit cases now:
- fastModel set → `generateContent` called with `model: 'qwen-turbo'`.
- fastModel unset → `generateContent` called with `model: 'main-model'`.

The tests intentionally mock a non-empty history so the kebab path
reaches the generateContent call site instead of bailing on empty input.
* fix(i18n): sync mismatched keys between en.js and zh.js (QwenLM#3503)

Add 4 keys missing from en.js that are actively used in source code,
add 5 missing Chinese translations to zh.js, integrate check-i18n
into CI to prevent future drift, and skip JSON file write in CI to
avoid dirtying the working tree.

---
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…M#3509)

* fix(cli): remove residual blank lines after MCP init completes (QwenLM#3095)

ConfigInitDisplay rendered <Box marginTop={1}> plus a content line, so
the live area grew by 2 rows during startup. When initialization
finished and the component unmounted, Ink shrank the live area but the
rows it had already committed to the terminal scrollback cannot be
reclaimed, leaving a visible gap above the input.

Move the MCP init status into the Footer's left-bottom status slot
(always mounted, fixed height) so the live area height stays constant
across the init → ready transition. The status participates in the
existing priority chain: ctrlC / ctrlD / escape / vim / shell /
autoAccept / configInit / hint.

* fix(cli): suppress MCP init message when custom status line is active

Audit follow-up. Previously the configInit branch preceded the
suppressHint branch in the footer's left-bottom priority chain. With
a custom status line configured, <Text>{null}</Text> collapses to
zero rows in Ink, so the footer's bottom row went from 1 row during
init to 0 rows after — a 1-row height oscillation that reintroduces
the same scrollback-residue symptom the original fix eliminated in
the default case.

Swap the order so suppressHint short-circuits to null first: the
init message now shares the hint's suppression rule, keeping the
footer's height constant in every configuration.

Also:
- Gate the hook's return on isConfigInitialized directly instead of
  letting the effect clear state, avoiding a one-frame flash where
  the stale "Initializing..." message leaks through on the first
  render after init completes.
- Cover the new behavior with three Footer tests, including a
  regression test for the custom-status-line case.

* fix(cli): show MCP init progress even under a custom status line

Reverting a UX trade-off introduced in the previous commit. That
change suppressed the init message whenever a custom status line was
active, arguing that <Text>{null}</Text> collapses to zero rows in
Ink and any non-zero init row would re-create a one-row shrink on
completion.

Zero shrink was the wrong goal. Hiding init progress from users who
have configured a status line is a real usability loss — the status
line does not surface MCP connection state, so those users now see
no feedback during startup. A one-time, one-line shrink on init
completion is a far smaller regression than the original two-row
scrollback residue this PR was created to fix, and strictly better
than the silent alternative.

Keep the init message in the left-bottom slot and let it sit above
suppressHint in the priority chain. Update the regression test so
that it pins the new behavior (init is visible with or without a
status line) and prevents the suppression from being reintroduced.

* fix(cli): keep MCP init progress visible in screen-reader mode

Footer is gated behind !isScreenReaderEnabled, so moving the init
message inside Footer silenced it for screen-reader users. Render the
same message as a plain Text node in Composer when the screen reader is
active — screen-reader users don't suffer from the live-area residual
row issue that motivated the original move, so an independent node is
safe for them.

* refactor(cli): drop duplicated screen-reader init path and show progress under YOLO

- ScreenReaderAppLayout already mounts <Footer /> directly, so the
  separate <Text> branch in Composer was producing a duplicated
  'Connecting to MCP servers...' line in screen-reader mode. Remove it.
- Move configInitMessage ahead of AutoAcceptIndicator in the footer's
  priority chain so users launched with YOLO / auto-accept-edits still
  see the ~1s startup progress; the approval-mode indicator takes over
  as soon as init finishes.
- Add unit tests for useConfigInitMessage covering the idle, progress,
  reset, and unsubscribe paths.
Co-authored-by: lawrence3699 <lawrence3699@users.noreply.github.com>
…ased approach (QwenLM#3502)

* feat(web-search): add GLM (ZhipuAI) web search provider

- Add GlmProvider class implementing BaseWebSearchProvider using the
  ZhipuAI Web Search API (https://open.bigmodel.cn/api/paas/v4/web_search)
- Support multiple search engines: search_std, search_pro, search_pro_sogou,
  search_pro_quark
- Support optional config: maxResults, searchIntent, searchRecencyFilter,
  contentSize, searchDomainFilter
- Truncate query to 70 characters per API limit
- Register 'glm' in the provider discriminated union (types.ts) and
  createProvider() switch (index.ts)
- Add GlmProviderConfig to settingsSchema, ConfigParams, and Config class
- Add --glm-api-key CLI flag and GLM_API_KEY env var support in webSearch.ts
- Forward GLM_API_KEY in sandbox environment
- Update provider priority list: Tavily > Google > GLM > DashScope
- Add 17 unit tests for GlmProvider and 4 integration tests in index.test.ts
- Update docs/developers/tools/web-search.md with GLM configuration,
  env vars, CLI args, pricing, and corrected DashScope billing info
- Fix stale OAuth/free-tier references in web-search.md

Closes QwenLM#3496

* docs(web-search): fix DashScope note and add GLM server-side limitations

* fix(web-search): make DashScope provider work with standard API key, remove qwen-oauth dependency

- DashScopeProvider.isAvailable() now checks config.apiKey instead of authType
- Remove OAuth credential file reading and resource_url requirement
- Use standard DashScope endpoint: dashscope.aliyuncs.com/api/v1/indices/plugin/web_search
- Read DASHSCOPE_API_KEY env var and --dashscope-api-key CLI flag
- Forward DASHSCOPE_API_KEY into sandbox environment
- Update integration test to detect DASHSCOPE_API_KEY
- Update docs to reflect new API key based configuration

* feat(web-search): remove built-in web search tool

The web_search tool and all related provider implementations are removed.
Web search functionality will be provided via MCP integrations instead,
which is the direction the broader agent ecosystem is moving.

Removed:
- packages/core/src/tools/web-search/ (entire directory)
- packages/cli/src/config/webSearch.ts
- integration-tests/cli/web_search.test.ts
- ToolNames.WEB_SEARCH, ToolErrorCode.WEB_SEARCH_FAILED
- webSearch config in ConfigParams, Config class, settingsSchema
- CLI options: --tavily-api-key, --google-api-key, --google-search-engine-id,
  --glm-api-key, --dashscope-api-key, --web-search-default
- Sandbox env forwarding for TAVILY/GLM/DASHSCOPE/GOOGLE search keys
- web_search from rule-parser, permission-manager, speculation gate,
  microcompact tool set, and builtin-agents tool list

* fix: remove websearch reference

* docs: remove websearch tool

* docs: add break change guide

* fix review
…logs

Add a generic skill that uses tmux as a

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>"real user recording harness":
drive the TUI with keyboard actions and capture-pane at each step to
produce a readable, reviewable interaction timeline log.

- General-purpose design: no scenario-specific hardcoding
- Helper script with session-collision detection and eval-friendly output
- wait-for polling instead of blind sleeps, timeout dumps current pane
- Manual workflow guide with scene-design methodology
This was referenced Apr 28, 2026
@BingqingLyu

BingqingLyu commented May 7, 2026

Copy link
Copy Markdown
Owner Author

Conflict Group 1

This PR shares modified functions with 23 other PR(s): #10, #106, #113, #114, #117, #14, #17, #18, #2, #21, #22, #26, #31, #36, #42, #46, #7, #71, #75, #86, #88, #94, #96.

These PRs should be reviewed as a batch — merging one may affect the others.

Function File Also modified by
SessionListItemView SessionPicker.tsx #113, #114, #117, #88
buildPermissionRules rule-parser.ts #113, #114, #117, #88
createStreamingInputWithControlPoint permission-control.test.ts #113, #114, #117, #71, #88
createToolRegistry config.ts #113, #114, #117, #88
extractToolNameFromRecord export-html-from-chatrecord-jsonl.js #113, #114, #117, #88
finalize chatRecordingService.ts #113, #114, #117, #88
findSessionsByTitle sessionService.ts #113, #114, #117, #88
generateSessionTitle renameCommand.ts #113, #114, #117, #88
getToolCallComponent ChatViewer.tsx #113, #114, #117, #42, #88
handleQwenAuth handler.ts #113, #114, #117, #36, #88
isBrowserLaunchSuppressed config.ts #113, #114, #117, #88
isSdkMcpServerConfig config.ts #106, #113, #114, #117, #18, #46, #75, #86, #88
isValidSessionId config.ts #113, #114, #117, #88
listSessions sessionService.ts #113, #114, #117, #88
loadCliConfig config.ts #106, #113, #114, #117, #36, #46, #75, #86, #88
main check-i18n.ts #113, #114, #117, #2, #88
makeConfig permission-manager.test.ts #113, #114, #117, #88
normalizeConfigOutputFormat config.ts #106, #113, #114, #117, #18, #75, #86, #88
parseApprovalModeValue config.ts #10, #113, #114, #117, #21, #22, #36, #46, #86, #88
parseArguments config.ts #10, #113, #114, #117, #14, #17, #18, #21, #22, #31, #36, #46, #7, #86, #88
parseRules rule-parser.ts #113, #114, #117, #88
readLastJsonStringFieldSync sessionStorageUtils.ts #113, #114, #117, #88
recordSlashCommand chatRecordingService.ts #113, #114, #117, #88
recordUiTelemetryEvent chatRecordingService.ts #113, #114, #117, #88
renameSession sessionService.ts #113, #114, #117, #88
resolveDefaultPermission permission-manager.ts #113, #114, #117, #88
start_sandbox sandbox.ts #10, #113, #114, #117, #26, #7, #88, #94
toPosixPath rule-parser.ts #113, #114, #117, #88
useDreamRunning Footer.tsx #113, #114, #117, #86, #88, #96
validateToolParamValues read-file.ts #113, #114, #117, #88

Posted by codegraph-ai conflict detection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conflicting-group-1 Conflicting PR group 1 — review as a batch conflicting-pr Shares at least one cross-PR dependency with other PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants