WezTerm multiplexer backend + Windows support via multiplexer abstraction#206
WezTerm multiplexer backend + Windows support via multiplexer abstraction#206marcfargas wants to merge 20 commits into
Conversation
…se 0) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@marcfargas Psmux maintainer here, which part of Tmux is not implemented in Psmux? Let me know if you have any specific thoughts and I'd be happy to address them. The goal of Psmux was to be literally Tmux itself on Windows Powershell without depending on WSL. That's why, we have all the full suite of themes and plugins of Tmux working on Windows without WSL. Tmux with the same config, same commands and tmux alias work as a drop-in-replacement for Tmux on Windows Powershell. I would encourage you to also have a look at https://psmux.pages.dev/ to understand how people are using Psmux in their projects. Let me know if you need any support and I will be happy to assist in anyway I can. |
It was I will check if CAO works with psmux in windows now :) amended the PR desc |
…facts Results from the Phase 1 investigation into WezTerm CLI primitives: - spike(1): wezterm cli spawn/send-text/get-text round-trip confirmed working - spike(2): send-text does not reach usable claude/codex TUI states (NEEDS-WORKAROUND) - spike(2b): explicit codex.cmd launches but send-text still does not submit (NEEDS-WORKAROUND) - spike(3): get-text polling detects markers fast enough without misses (GO) - spike(4): plain get-text matches Claude trust text; Codex never reaches regexable TUI (NEEDS-WORKAROUND) - spike(summary): final findings — two-step send (paste + inject Enter) required - Gemini post-install findings incorporated for spike(2) and spike(4) Key conclusions carried into Phase 2: 1. Codex launch requires the full codex.cmd shim with -c hooks=[] --yolo --no-alt-screen --disable shell_snapshot 2. WezTerm send-text populates composer but does NOT submit — two-step primitive required (paste + inject Enter) 3. Plain get-text (no --escapes) sufficient for polling; 500ms cadence has zero misses Co-Authored-By: Claude <noreply@anthropic.com>
Evidence artefacts for the Phase 1 and Phase 1b spike tasks (both already completed). Pairs with spikes/*-result.md for full reproducibility.
Codex audit of tmux/libtmux callsites in src/. 18 findings: 2 confirmed provider-expected (claude_code, codex), 1 unix-tooling (inbox_service tail -n), 14 hidden-leakage (5 non-MVP providers + services + utils + CLI/API attach). Plan sufficient for Claude+Codex MVP via shim; non-MVP rewiring tracked as Phase 2 follow-up tasks. Gemini first dispatch hit MODEL_CAPACITY_EXHAUSTED on gemini-3-flash-preview server-side. Codex retry succeeded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 Task 1 — first half of the multiplexer abstraction. Backend-neutral ABC with default send_keys() = paste-then-submit; LaunchSpec dataclass for backend-direct spawn (Codex shim path on Windows). Working-directory validation helper preserved byte-for-byte from TmuxClient for Task 2 parity. 33 contract tests, zero new failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…and route startup handlers through send_special_key Phase 2 Tasks 2 and 3 — mechanical refactors that lay the groundwork for the WezTerm backend without changing observable behaviour. Task 2 (TmuxClient → TmuxMultiplexer): - TmuxMultiplexer is now a proper BaseMultiplexer subclass in multiplexers/tmux.py - clients/tmux.py reduced to a 38-line back-compat shim re-exporting TmuxMultiplexer + the tmux_client singleton - send_keys() body split into _paste_text + _submit_input per ABC contract - _resolve_and_validate_working_directory inherited from BaseMultiplexer Task 3 (startup handlers through send_special_key): - claude_code.py: raw tmux send-keys subprocess and libtmux pane.send_keys replaced with tmux_client.send_special_key() - codex.py: same trust-path Enter via send_special_key() - Tests updated to assert through mocked send_special_key seam Co-Authored-By: Claude <noreply@anthropic.com>
…(Codex) Phase 2 Task 4. Runtime backend selection with priority order: 1. CAO_MULTIPLEXER env override (tmux|wezterm) 2. TMUX env set → tmux 3. WEZTERM_PANE or TERM_PROGRAM=WezTerm → wezterm 4. Platform default: win32 → wezterm, else tmux Singleton via lru_cache. Lazy import of WezTermMultiplexer keeps package import working before Task 5 lands the WezTerm module. 10 contract tests covering all branches (env override, TMUX, WezTerm env signals, platform defaults, invalid override, singleton caching). Verified: 43 fail / 1049 pass (test/clients + test/multiplexers + test/providers + test/services + test/utils minus test/e2e) — matches baseline; +10 tests added. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 Task 6. Replaces `tail -n` subprocess in inbox_service.py with a Windows-safe backward-block file scanner. Same `(terminal_id, lines) -> str` contract; same edge cases (missing/empty file → ""); 4 KiB block size; UTF-8 decode with errors="replace"; CRLF + LF mix handling; multibyte boundary safety; long-line carryover. Inbox-service tests now use real temp log files (tmp_path) instead of subprocess.run mocks. Pre-existing Windows tail/symlink failures reduced (1032 → 1039 pass, same 43 fail baseline) — pure-Python path removes one Unix-tooling dependency. Identified by TSK-071 audit (UNIX-TOOLING category, single finding). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 Task 5. New backend implementing BaseMultiplexer over the wezterm CLI: - spawn via `wezterm cli spawn --new-window --cwd <dir> --set-environment CAO_TERMINAL_ID=<id>`; honors LaunchSpec.argv + LaunchSpec.env per plan §4. - two-step send: _paste_text via default bracketed-paste send-text; _submit_input via `--no-paste -- "\r"` with 300ms after-paste + 500ms inter-Enter delays matching tmux. - send_special_key with key-name → VT escape mapping (Enter/Tab/Up/Down/Left/Right/Escape/Backspace) + literal=True passthrough. - get_history via plain `wezterm cli get-text` (no --escapes per spike 4); tail_lines slice client-side. - in-memory pane registry per plan §4 (one CAO window = one WezTerm OS window for MVP). - Runner injection seam (`runner: WezTermRunner | None = None`) keeps tests deterministic without launching real wezterm. Deferred per scope: - pipe_pane / stop_pipe_pane raise NotImplementedError → Task 7 - Codex-on-Windows launch resolver → Task 8 - get_pane_working_directory returns None for MVP → follow-up 34 contract tests across 11 classes. Full non-e2e suite: 43 fail / 1083 pass — same baseline; +34 added. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 Task 7. Replaces NotImplementedError stubs with a per-pane 500ms polling thread. - pipe_pane: starts daemon thread, opens file in append mode. - _diff_snapshot: pure helper. fast-path when current.startswith(prev); line-suffix fallback for redraws/scrollback; full-append on no overlap. Pure (no I/O), unit-tested independently. - stop_pipe_pane: sets stop_event, joins with 2s timeout. - kill_session / kill_window auto-stop active pollers. - Inter-poll cadence injectable via __init__(poll_interval=...) for fast deterministic tests (default 0.5s per spike 3 empirical 0-miss/144-207ms first-detection latency). 47 wezterm tests pass (13 new poller cases + 34 from Task 5). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 Task 8. Wires the Codex-on-Windows shim path into the multiplexer
abstraction without polluting the generic provider command builder.
- multiplexers/launch.py: build_launch_spec(provider, command_argv, *,
platform, working_directory). Resolver order on Windows:
1. CAO_CODEX_BIN env override
2. shutil.which("codex.cmd")
3. known Scoop fallback paths
4. degrade to bare command (caller sees spawn error)
Other providers and Unix pass through unchanged.
- providers/codex.py: command builder injects `-c hooks=[]` on Windows
per spike 2b empirical finding (local Codex hooks config rejected).
initialize() skips shell warm-up echo when isinstance(tmux_client,
WezTermMultiplexer) AND _direct_spawned, falling back to welcome/trust
marker polling. tmux behavior unchanged.
Worked example (matches spike 2b):
wezterm cli spawn --new-window --cwd <dir> --set-environment
CAO_TERMINAL_ID=<id> -- C:\...\codex.cmd
-c hooks=[] --yolo --no-alt-screen --disable shell_snapshot
Targeted suite green (185 pass / 0 fail across codex+claude unit +
multiplexers).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(Codex) Phase 2 Task 9 + TSK-080 follow-up. terminal_service: - replaced 14 call sites of tmux_client.<m>(...) with get_multiplexer().<m>(...). lru_cache on the accessor keeps repeated calls O(1). - added optional launch_spec: LaunchSpec | None = None to create_terminal(); forwards verbatim to multiplexer.create_session() and create_window(). Default None preserves all existing callers. - removes the last runtime-critical hidden-leakage finding from TSK-071's audit (#10/awslabs#11 in that report). Test migration (+ TSK-080 follow-up): - test_terminal_service.py: switched fixtures to patch get_multiplexer accessor seam, added LaunchSpec pass-through coverage. - test_terminal_service_full.py, test_terminal_service_coverage.py, test_plugin_event_emission.py: migrated decorators/setattrs/assertion references from .tmux_client to .get_multiplexer (the original Task 9 prompt missed these three; Codex caught the regression after the source change exposed them as +40 failures). Verified: 43 fail / 1107 pass — same baseline; +24 net new tests pass since Task 6 baseline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 Task 10 — final MVP task. Real-binary smoke coverage,
gated by pytest marker so default `pytest` invocations skip it.
- pyproject.toml: extends existing `not e2e` filter to
`-m 'not e2e and not smoke'`. Registers `smoke` marker.
- test/smoke/conftest.py: skipif fixtures for wezterm / claude /
codex on PATH; `wait_for_text(multiplexer, ...)` polling helper.
- test_wezterm_basics.py: spawn → send → get → kill round-trip.
- test_claude_startup.py: trust-prompt acceptance via
send_special_key("Enter") (validates the Task 3 abstraction
end-to-end on a real Claude pane).
- test_codex_direct_spawn.py: build_launch_spec("codex", ...) →
WezTerm direct-spawn → two-step send (paste + Enter)
(validates Task 8 + Task 5 together).
- test_inbox_poller.py: pipe_pane captures rapid output at the
500ms cadence (validates Task 7 against a real bash pane).
Invocation:
pytest -m smoke # opt-in, all smoke
pytest test/smoke -m smoke # scoped
Default `pytest` runs collect zero smoke tests. Verified:
- default: 43 fail / 1107 pass (matches baseline)
- opt-in: 4 smoke tests collected
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ty and efficiency passes Three-wave editorial review after Phase 2 MVP landed. Combined into one commit since all three waves touch the same files and form a single review-and-clean milestone. Wave A — must-fix bugs (Sonnet): - Provider-multiplexer wiring: claude_code.py and codex.py were importing tmux_client directly (always TmuxMultiplexer); both now use get_multiplexer() so CAO_MULTIPLEXER=wezterm is actually honoured at runtime - Dead isinstance check in codex.py fixed (was permanently False) - WezTermMultiplexer: threading.Lock protecting _sessions/_pollers - Poller thread-leak on stop timeout corrected; _poll_loop self-cleans on exit Wave B — quality and efficiency nits (Sonnet, 9 items): - _create_pane helper extracted (deduplicates create_session/create_window) - default_platform() promoted and reused to drop duplicate sys.platform check - send_special_key with unknown key raises guarded KeyError naming valid set - _pane_id registry-miss raises KeyError; spawn/binary-missing still RuntimeError - direct_spawned derived from self._launch_spec at call site (avoids drift) - working_directory parameter dropped from build_launch_spec (always deleted) - Bare "codex" string replaced with _PROVIDER_KEY_CODEX constant - _get_log_tail: incremental newline counting instead of full-buffer rescan - _poll_loop: skip registry write when snapshot unchanged (quiescent panes) Wave C — Opus pass over residual nits: - _PollerState: drop unused snapshot and file_path fields - _sessions registry: store pane_id string directly (drop always-None placeholders) - _poll_loop: hoist log file open out of inner loop (hundreds of fd ops/min/pane) - _poll_loop: drop unreachable else branch and dead poller.snapshot writes - Strip past-phase narration from docstrings; remove stale progress comments Co-Authored-By: Claude <noreply@anthropic.com>
…t-spawn (Codex) Phase 2 introduced a LaunchSpec plumbing through the multiplexer so that on WezTerm, providers can spawn the CLI directly via `wezterm cli spawn -- <argv>` (Phase 1 spike 2b: send-text does not submit reliably). The wiring was incomplete — terminal_service never constructed a LaunchSpec, so on WezTerm + codex the multiplexer spawned a plain shell while CodexProvider.initialize() took the direct-spawned skip path and never sent the codex command. Net effect in production: codex never started on WezTerm. Fix (Option A — provider-supplied launch_spec): - BaseProvider exposes a default `get_launch_spec(multiplexer) -> Optional[LaunchSpec]` returning None. - CodexProvider overrides it to build/cache a LaunchSpec only when the active multiplexer is WezTermMultiplexer. - terminal_service.create_terminal now builds the provider BEFORE pane creation, asks it for a launch_spec, and forwards the result into create_session/create_window. An explicit caller-supplied launch_spec still beats the provider default. - After the multiplexer returns the actual window name, the provider's session_name/window_name are updated before initialize(). Each provider now owns its own direct-spawn decision; terminal_service stays backend-agnostic. Adding the same path for Claude or Gemini later is a one-method override. Tests: 798 passed (+3 vs Wave C), 7 pre-existing Windows symlink failures (q_cli/tmux working-directory) unchanged. Spike result and review notes in spikes/TSK-082-result.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ezterm via env
Two fixes surfaced by the first real WezTerm smoke run on Windows.
1) BaseMultiplexer._resolve_and_validate_working_directory rejected
every Windows path. The CodeQL SafeAccessCheck guard only allowed
real_path.startswith("/"), which is false on Win32 where paths look
like C:\... after abspath/realpath. Extend the guard to accept the
"X:\\" drive-letter prefix in addition to "/", preserving CodeQL
recognition of str.startswith() as a SafeAccessCheck.
This was masked by Phase 2 unit tests because they mock
_resolve_and_validate_working_directory rather than exercise it.
Net effect: every Windows code path that hit this validator was
silently broken — test_raises_for_nonexistent_directory now passes
(it was previously failing on the wrong error message).
2) test/smoke/conftest.py now resolves wezterm via:
CAO_WEZTERM_BIN override → shutil.which → WEZTERM_EXECUTABLE_DIR.
Portable WezTerm installs aren't on PATH but always set
WEZTERM_EXECUTABLE_DIR when CAO runs inside a WezTerm pane, so the
harness picks up the binary without manual PATH munging.
CAO_WEZTERM_BIN is the explicit knob for CI / off-WezTerm testing.
Tests: 799 passed (+1 vs TSK-082), 6 pre-existing Windows symlink
failures (q_cli/tmux working-directory) unchanged.
Smoke run is still blocked by a separate Phase 2 design bug
(WezTerm `cli spawn` does not support `--set-environment`). Tracked
for follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`wezterm cli spawn` does not support `--set-environment`; the flag is silently ignored, so every spawn on this branch was losing CAO_TERMINAL_ID and any launch_spec.env. Upstream closed this as out of scope (wezterm/wezterm#6565), so the wrap lives in CAO: - Unix: `env KEY=VALUE -- <argv>` (exec-replaces; target stays pid 1). - Windows: `powershell.exe -NoLogo -NoProfile -Command ...` with single-quoted PS literals and `$args=@(...); & <exe> @args` splatting. PowerShell cannot exec-replace on Windows, so the wrapper sits in the tree as parent of the target. CAO is immune: status detection is regex-against-`get-text` output, not `wezterm cli list` / process_name. If a future path does inspect the foreground process, wezterm walks the descendant tree on Windows (`find_youngest()` in mux/src/localpane.rs) and reports the youngest attached process, so the real target wins once started. Falls back to `\$SHELL` / `%COMSPEC%` when launch_spec is None so env injection still happens for the default-shell case. Tests parameterize over sys.platform to exercise both wrapper shapes from one host. Refs: TSK-083, wezterm/wezterm#6565 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…build_launch_spec kwarg Two related smoke-harness fixes found during Windows dogfooding. Codex v0.123.0 changed the workspace trust banner wording — the previous anchor "allow Codex to work in this folder" no longer appears, causing CodexProvider._handle_trust_prompt to time out on sessions launched into a new directory. Anchor on "Do you trust the contents of this directory" instead. The WezTerm smoke harness is extended to launch Codex with --no-alt-screen (required for scrollback to be readable) and to mirror the full trust-prompt + welcome-banner handshake before exercising the two-step send. Wave B simplify (previous commit) removed working_directory from build_launch_spec()'s signature; the Codex direct-spawn smoke test was not updated and failed with TypeError. Drop the redundant kwarg. Co-Authored-By: Claude <noreply@anthropic.com>
… trampoline pwsh-native Two Windows-portability fixes that together make cao-server boot and the WezTerm backend work end-to-end on a stock Windows install. api/main.py — defer Unix-only imports (fcntl, pty, termios): These were imported at module load time, crashing cao-server with ModuleNotFoundError before argparse could run on Windows. Moved into the terminal_ws() WebSocket handler body; that endpoint is tmux-on-Unix only and returns code 4501 on win32. The deferred follow-up is recorded in docs/PLAN-phase2.md §10. multiplexers/wezterm.py — four layered Windows defects fixed: 1. WEZTERM_EXECUTABLE pointing at wezterm-gui[.exe]: _normalize_wezterm_bin rewrites the basename (case-insensitive) to its CLI sibling, preserving the parent directory, and logs a one-line warning. 2. UnicodeDecodeError on stdout: force encoding="utf-8", errors="replace" in _default_runner (locale codepage cp1252 choked on WezTerm's UTF-8). 3. cmd.exe as the agent pane shell: _default_shell() now resolves to pwsh.exe (or Windows PowerShell as fallback) via _resolve_powershell_bin; honours CAO_POWERSHELL_BIN for explicit override. 4. Two-shell-deep trampoline: _wrap_with_env(argv=None) emits pwsh -NoExit -Command "<env-set>" so the same pwsh process becomes the pane shell (no pwsh->pwsh nesting). LaunchSpec path still execs the target inside one pwsh. Test surface grows from 70 to 78: TestNormalizeWezTermBin (12), TestDefaultRunner (2), TestResolvePowerShellBin (5), plus a regression test locking in _default_shell() != cmd.exe on win32. Co-Authored-By: Claude <noreply@anthropic.com>
a100c58 to
ad781af
Compare
|
sure @marcfargas , the cc features of Psmux is also available in the latest release. any assistance with Psmux, please let me know and we will sort it out for you. Feel free to raise an issue anytime. Also, psmux works well with Wezterm. Let me know how it goes. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #206 +/- ##
=======================================
Coverage ? 21.19%
=======================================
Files ? 57
Lines ? 4648
Branches ? 0
=======================================
Hits ? 985
Misses ? 3663
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Closing in favor of #207. The wezterm-multiplexer + Given #207 ships the same end-user capability with a much smaller surface area, it doesn't make sense to pursue the abstraction layer right now. If a future backend (a real WezTerm multiplexer protocol, or something else) ever needs to coexist with tmux, this PR is a useful reference for what that abstraction would look like — but it shouldn't block on review now. |
Status
Draft / RFC — not ready for review or merge. Opening early to signal the work-in-progress and invite directional feedback on the abstraction shape.
What this explores
Extracting CAO's tmux coupling behind a
BaseMultiplexerinterface so backends can be swapped. This branch adds:TmuxMultiplexer— near-identity refactor of today'stmux_clientWezTermMultiplexer— new backend driven bywezterm cliBaseProviderconstructor accepts a multiplexer instanceNet effect: CAO runs on Windows, macOS, and Linux — not only platforms with tmux 3.3+.
Providers keep ownership of their own state detection and startup-prompt handling. Only the multiplexer boundary moves.
Why WezTerm as the first new backend
wezterm cli spawn | send-text | get-text | list | kill-panemaps cleanly to the tmux surface CAO already depends onCurrent state of this branch (intentionally messy — will be cleaned)
will be cleaned
Structured as three phases:
docs/multiplexer-api-surface.mdenumerates every public method onTmuxClient, callers, and porting risks.spikes/01-result.mdthrough04-result.md, plus02b-codex-launch.mdandSUMMARY.md, validating WezTerm CLI round-trip, paste-mode per provider, polling latency as apipe-panereplacement, and ANSI/regex compatibility.Before this is ready for review, the exploration artefacts (
docs/multiplexer-api-surface.md,spikes/) will be removed and commit history will be squashed/rebased to land as a coherent sequence.What will eventually land upstream
BaseMultiplexerinterfaceTmuxMultiplexer(behavioral-identity refactor of currenttmux_client)WezTermMultiplexer(new)BaseProviderrefactor to take a multiplexer instanceWhat will NOT land
docs/andspikes/— those are exploration scaffoldingNot asking for review yet
Opening the draft PR early for three reasons:
tmux_clientthat'd create conflicts (or at least notice it)