feat(herdr): add herdr terminal backend with event-driven inbox delivery#271
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #271 +/- ##
=======================================
Coverage ? 90.97%
=======================================
Files ? 78
Lines ? 8023
Branches ? 0
=======================================
Hits ? 7299
Misses ? 724
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
cf3d021 to
6fc25e8
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds a pluggable terminal-backend abstraction and introduces a new herdr backend with event-driven inbox delivery, reducing reliance on tmux polling/log-watching and enabling native agent-status awareness where available.
Changes:
- Introduces
TerminalBackend+ registry/factory and migrates services/providers/CLI commands to callget_backend()instead of directly usingtmux_client. - Adds
HerdrBackendandHerdrInboxServicefor socket-event-driven inbox delivery, plus lifecycle cleanup/reconnect behavior. - Expands/updates unit and integration tests to cover backend selection, inbox wiring, and updated delivery semantics (including eager delivery changes).
Reviewed changes
Copilot reviewed 59 out of 60 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| test/services/test_terminal_service.py | Updates unit tests to patch the backend registry instead of tmux_client. |
| test/services/test_terminal_service_inbox_registration.py | New tests for create/delete terminal wiring to herdr inbox registration/unregistration. |
| test/services/test_terminal_service_full.py | Updates full terminal_service tests for backend abstraction and adds get_output escalation tests. |
| test/services/test_terminal_service_coverage.py | Updates cleanup-path coverage tests to use the backend abstraction. |
| test/services/test_session_service.py | Updates session_service tests to mock get_backend() instead of tmux_client. |
| test/services/test_plugin_event_emission.py | Updates plugin event emission tests for backend abstraction + delete_session behavior change. |
| test/services/test_inbox_service.py | Updates inbox tests for eager-delivery semantics and transient pane-resolution failures. |
| test/services/test_herdr_inbox_registry.py | New tests for the herdr inbox service registry singleton + TYPE_CHECKING import guard. |
| test/services/test_flow_service.py | Updates flow_service tests to use get_backend() instead of tmux_client. |
| test/providers/test_permission_prompt_detection.py | Updates Kiro permission prompt detection tests to patch get_backend(). |
| test/providers/test_opencode_cli_unit.py | Updates OpenCode provider unit tests to patch get_backend(). |
| test/providers/test_kiro_cli_integration.py | Pins backend registry to tmux for integration tests on hosts configured for herdr. |
| test/providers/test_kimi_cli_unit.py | Updates Kimi provider unit tests to patch get_backend(). |
| test/providers/test_gemini_cli_unit.py | Updates Gemini provider unit tests to patch get_backend(). |
| test/providers/test_copilot_cli_unit.py | Updates Copilot CLI provider unit tests to patch get_backend(). |
| test/providers/test_claude_code_coverage.py | Updates Claude Code coverage tests to patch backend registry and reflect startup-prompt handling changes. |
| test/clients/test_database.py | Adds coverage for create_inbox_message receiver-terminal existence validation. |
| test/cli/commands/test_terminal.py | Updates CLI terminal command tests to use get_backend() instead of tmux_client. |
| test/cli/commands/test_launch.py | Updates launch tests to expect backend attach behavior (not tmux attach subprocess). |
| test/backends/test_tmux_backend.py | New tests ensuring TmuxBackend satisfies the TerminalBackend ABC and delegates properly. |
| test/backends/test_factory.py | New tests for config-driven backend selection via BackendFactory. |
| test/backends/init.py | Adds/keeps test package marker for backend tests. |
| test/backends/test_herdr_backend.py | Adds unit tests for HerdrBackend behavior and contract compliance. |
| test/backends/test_herdr_inbox_service.py | Adds unit tests for socket-event inbox delivery behavior and reconnection/lifecycle handling. |
| test/api/test_plugin_lifespan.py | Updates lifespan tests to patch get_backend() during app startup. |
| test/api/test_lifespan_inbox.py | New tests verifying lifespan wiring selects HerdrInboxService vs PollingObserver based on backend. |
| test/api/test_api_endpoints.py | Updates /health test expectations and lifespan tests to patch get_backend(). |
| src/cli_agent_orchestrator/utils/terminal.py | Updates wait_for_shell to accept a TerminalBackend instead of TmuxClient. |
| src/cli_agent_orchestrator/services/terminal_service.py | Migrates terminal operations to backend abstraction; adds herdr inbox registration; improves get_output extraction logic. |
| src/cli_agent_orchestrator/services/session_service.py | Migrates session operations to backend abstraction; changes delete_session to tolerate backend session already gone. |
| src/cli_agent_orchestrator/services/inbox_service.py | Adjusts eager delivery gating and handles TerminalNotFoundError as transient (reset message to PENDING). |
| src/cli_agent_orchestrator/services/herdr_inbox_registry.py | New module-level singleton registry for the running HerdrInboxService instance. |
| src/cli_agent_orchestrator/services/herdr_inbox_service.py | New event-driven inbox delivery service using herdr Unix socket events + reconciliation/cleanup. |
| src/cli_agent_orchestrator/services/flow_service.py | Migrates flow session existence/kill operations to backend abstraction. |
| src/cli_agent_orchestrator/providers/q_cli.py | Migrates provider terminal I/O to backend abstraction. |
| src/cli_agent_orchestrator/providers/opencode_cli.py | Migrates provider terminal I/O to backend abstraction. |
| src/cli_agent_orchestrator/providers/kiro_cli.py | Migrates provider terminal I/O to backend abstraction. |
| src/cli_agent_orchestrator/providers/kimi_cli.py | Migrates provider terminal I/O to backend abstraction. |
| src/cli_agent_orchestrator/providers/gemini_cli.py | Migrates provider terminal I/O to backend abstraction. |
| src/cli_agent_orchestrator/providers/copilot_cli.py | Migrates provider terminal I/O to backend abstraction. |
| src/cli_agent_orchestrator/providers/codex.py | Migrates provider terminal I/O to backend abstraction and simplifies trust-prompt acceptance. |
| src/cli_agent_orchestrator/providers/claude_code.py | Adds herdr-native status path + dispatch tracking; updates response marker parsing; migrates I/O to backend abstraction. |
| src/cli_agent_orchestrator/providers/base.py | Updates eager-delivery documentation to remove WAITING_USER_ANSWER from eligible states. |
| src/cli_agent_orchestrator/constants.py | Updates eager inbox delivery comment to reflect PROCESSING-only eligibility. |
| src/cli_agent_orchestrator/clients/database.py | Validates receiver terminal existence in create_inbox_message. |
| src/cli_agent_orchestrator/cli/commands/terminal.py | Migrates restore window creation to backend abstraction. |
| src/cli_agent_orchestrator/cli/commands/launch.py | Migrates session attach to get_backend().attach_session() instead of tmux attach subprocess. |
| src/cli_agent_orchestrator/backends/base.py | New TerminalBackend ABC + shared backend exceptions + capability hooks (native status, pane id). |
| src/cli_agent_orchestrator/backends/tmux_backend.py | New TmuxBackend adapter wrapping existing TmuxClient. |
| src/cli_agent_orchestrator/backends/herdr_backend.py | New HerdrBackend implementing TerminalBackend via herdr CLI (pane resolution, native status, attach, etc.). |
| src/cli_agent_orchestrator/backends/registry.py | New module-level backend singleton registry (get_backend/set_backend). |
| src/cli_agent_orchestrator/backends/factory.py | New config-driven backend factory (defaults to tmux; opt-in herdr). |
| src/cli_agent_orchestrator/backends/init.py | Exposes backend public API surface (ABC, factory, exceptions). |
| src/cli_agent_orchestrator/api/main.py | Lifespan selects herdr socket inbox vs tmux polling observer; expands /health component probes; maps TerminalNotFoundError to 404. |
| docs/herdr.md | New documentation describing herdr backend configuration, behavior, and troubleshooting. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
haofeif
left a comment
There was a problem hiding this comment.
Got a couple comments here
| Start the CAO server the same way as with tmux -- no additional flags needed: | ||
|
|
||
| ```bash | ||
| uv run cao-server |
There was a problem hiding this comment.
currently in root README.md, we only need to run cao server, without the "uv run", can we do cao server instead without running uv run ?
There was a problem hiding this comment.
Good catch. Fixed in 7ad426b -- dropped uv run from both occurrences in this doc (Launching + troubleshooting) to match the README, since cao-server is a project.scripts entry point and works bare once installed.
|
|
||
| ## Configuration | ||
|
|
||
| Set `terminal_backend` in `~/.aws/cli-agent-orchestrator/config.json`: |
There was a problem hiding this comment.
do you think we can do something like cao server --terminal herdr to start on herdr instead of need to write a config ?
There was a problem hiding this comment.
I like the idea, but I'd prefer to keep it out of this PR. Today the backend is resolved by BackendFactory from config.json through the get_backend() singleton, which is read lazily on first use (including inside the FastAPI lifespan). A --terminal flag means parsing it in api/main.py:main(), calling set_backend(BackendFactory.create(...)) before the lifespan runs, and defining precedence vs the config.json value. That's a reasonable UX improvement but it's a separate behavior change from the inbox wiring this PR is scoped to. I will see if I can track this as a follow up.
There was a problem hiding this comment.
Decided to add this. Implemented in 2b7b402.
cao-server --terminal {tmux,herdr} now overrides terminal_backend from config.json for that run, so you can start on herdr without writing a config file:
cao-server --terminal herdrPrecedence is flag > config.json > tmux default. herdr_session is still read from config.json, so the flag only selects the backend. Implementation threads a backend_override through BackendFactory and calls set_backend() before uvicorn starts; since the backend singleton is otherwise lazy (first resolved in the lifespan), nothing else changes when the flag is absent. Added factory precedence tests and main()-wiring tests, and documented it in docs/herdr.md.
Note the entry point is cao-server (the project.scripts console script), so it's cao-server --terminal herdr rather than cao server.
|
@anilkmr-a2z This PR is a substantial backend abstraction and a useful Herdr integration. Had the following two findings, also please kindly resolve the merge conflict ? after this i think we are ready to merge. Very excited in bringing FindingsP1. Herdr backend ignores
|
…rvice Implements the herdr-backend-support OpenSpec change: TerminalBackend ABC, TmuxBackend wrapper, HerdrBackend (CLI + socket API), BackendFactory with config-driven selection, native status detection, and HerdrInboxService (socket event-based inbox delivery, not yet wired into the app lifespan). Parks work-in-progress ahead of the fix-herdr-inbox-delivery-wiring change, which wires HerdrInboxService into the running app and fixes the eager delivery modal-injection defect.
Two bugs prevented inbox message delivery, plus the supporting wiring to make herdr-backed delivery work end to end. Bug A — eager delivery eligibility gate (inbox_service.py): WAITING_USER_ANSWER was incorrectly included in the eager delivery eligibility check. A terminal waiting on a user answer is not buffering input for a next turn, so delivering eagerly in that state is wrong. Restrict eager eligibility to PROCESSING only. Bug B — HerdrInboxService never wired into lifespan (api/main.py): Under the herdr backend the service was never instantiated or started, so zero messages were delivered. The lifespan now selects the delivery mechanism by backend: HerdrInboxService (socket events) for HerdrBackend, PollingObserver (file polling) otherwise, with matching shutdown for each. Supporting wiring: - Add HerdrInboxRegistry (module-level get/set singleton) with a TYPE_CHECKING-guarded import to break the circular import chain between terminal_service and herdr_inbox_service. - Connect terminal create/delete to register/unregister the terminal with the herdr inbox service; failures are logged, not fatal. - Expose HerdrBackend.herdr_session for the service to bind its socket. - Fix the async seam in HerdrInboxService: capture the running loop in start() and use run_coroutine_threadsafe() for pane subscription, since register_terminal() is called from the synchronous create_terminal path. - Clear the HerdrInboxService singleton on shutdown (set_herdr_inbox_service now accepts None) so the registry does not retain a stale reference after the service task is cancelled, keeping startup and shutdown symmetric. Update providers/base.py and constants.py docs to match the corrected PROCESSING-only eager delivery semantics. Tests: 66 passed across the five affected suites. Refs: none (internal herdr bug fix) 🤖 Assisted by Amazon Q Developer
…cache Fixes silent message misdelivery after a CAO server restart. The window resolution path keyed off an in-memory _window_to_terminal map populated only at create time and never rebuilt; after a restart it was empty, and resolution fell through to a fallback that returned the FIRST pane in the workspace, misrouting messages to the wrong terminal (marked delivered). Resolve from herdr's live state every call instead: workspace label (= session) -> tab label (= window) -> pane on that tab. herdr is a separate process that survives CAO restarts, so this is restart-safe with no new persistence. Window names are unique by construction (profile + uuid suffix). - _resolve_pane_id_from_window: always resolve via the label chain; wrap backend errors as TerminalNotFoundError so the inbox path can leave messages pending - remove _window_to_terminal entirely (declaration, populate, cleanup sites) - remove the silent first-pane fallback; raise TerminalNotFoundError instead - inbox_service: treat TerminalNotFoundError on send as transient -> leave the message pending for retry (not FAILED, not misrouted) - _resolve_pane_id (socket inbox path) unchanged Tests: test_herdr_backend (label-chain resolution, fail-loud) and test_inbox_service (pending-on-resolution-failure) pass. Implements OpenSpec change fix-herdr-pane-resolution-across-restart.
…vents Root cause of 106K/day reconnect storm: _resubscribe_all() was sending subscriptions for pane_ids that no longer exist in herdr. Herdr responds to any invalid pane subscription by ECONNRESET-ing the entire socket. Backoff never escalated because it was reset after _connect() success, not after full setup. Changes: - Add _reconcile(): runs on every socket connect, before _resubscribe_all(). Calls 'herdr pane list', prunes _pane_to_terminal of stale entries, deletes orphaned DB terminal records via delete_terminal(), kills workspaces with zero remaining live terminals. Builds _workspace_to_session map for lifecycle event handling. - Add _subscribe_lifecycle_events(): subscribes to pane.closed and workspace.closed globally after resubscribe. No pane_id needed. - Add _handle_lifecycle_event(): pane.closed deletes single terminal and kills workspace if now empty. workspace.closed deletes all terminals for that session and prunes maps using pane_id prefix matching. - Fix backoff reset: moved to after successful reconcile + resubscribe + lifecycle-subscribe, not just _connect(). - Fix import: get_backend() is in backends.registry, not backends.__init__. Two call sites updated to match the pattern used everywhere else in src/. - Fix double get_terminal_metadata() call in pane.closed handler: refactor to walrus-operator list comprehension, eliminating None dereference. - Add 13 unit tests covering _reconcile prune/no-op/failure paths, _subscribe_lifecycle_events message format, and _handle_lifecycle_event for pane.closed and workspace.closed including unknown-entity no-ops. 🤖 Generated with Amazon Q Developer
…cile
_reconcile() previously only pruned panes it tracked in _pane_to_terminal.
On server restart the map starts empty, so ghost DB records from prior runs
(terminals whose herdr tabs no longer exist) were never cleaned up. This
caused repeated 'Terminal not found' errors and broke 'cao session list'
(CLI silently drops a session when its conductor terminal returns 500).
Fix in two parts:
1. herdr_inbox_service.py: after the in-memory map prune, run `herdr tab
list` and cross-check DB terminals for every tracked workspace. Any DB
terminal whose tmux_window label is absent from live herdr tabs is
deleted. This runs on every socket connect, covering both startup and
reconnect scenarios.
2. api/main.py: GET /terminals/{id} catches TerminalNotFoundError from the
backend and returns 404 instead of 500. Ghost terminals that survive
until the next server restart now fail with the correct status code,
making CLI error handling straightforward.
10 new tests (33 total in test_herdr_inbox_service.py).
… reconnect The DB cross-check in _reconcile() only fires when _pane_to_terminal is non-empty (the socket loop is gated by that condition). On a fresh server start with no registered terminals, _reconcile() never runs, so ghost records from prior runs survive indefinitely. Extract the workspace/tab cross-check into _startup_db_cleanup() and call it unconditionally from start() before the socket loop begins. The method fetches herdr workspace list and tab list directly, building its own workspace map without depending on _pane_to_terminal or _workspace_to_session. 3 new tests (36 total).
Five test/impl fixes surfaced after rebasing onto upstream/main: - /health endpoint: implement the `components` block (cao/herdr/claude) that test_health_check_returns_ok already asserted. herdr/claude probed via shutil.which -> "ok" or "unavailable". - test_herdr_backend: fix infinite hang in test_logs_warning_when_socket_never_appears. The fake_time stub returned 6.0 but _ensure_session_running's poll deadline is now time()+15.0, so 6.0 < 15.0 looped forever. Bump stub to 16.0. - test_tmux_backend: update create_session/create_window delegation assertions for the new extra_env kwarg. - test_api_endpoints + test_plugin_lifespan: patch get_backend to a TmuxBackend mock so the lifespan PollingObserver assertions hold regardless of the local terminal_backend config. - test_kiro_cli_unit: test_mcp_server_init_yields_processing patched the removed kiro_cli.tmux_client; switch to get_backend mock like the sibling tests.
…anges After rebasing onto upstream/main, three integration points needed repair: - inbox_service: upstream now marks a message DELIVERED *before* send_input (PR awslabs#265, closes the watchdog re-entrancy race). My TerminalNotFoundError handler returned without resetting that optimistic status, so a transient pane-resolution failure would strand the message as DELIVERED-but-unsent. Reset to PENDING in the handler so the retry path still fires. - test_inbox_service: update test_resolution_failure_leaves_message_pending to assert the final status is PENDING (DELIVERED then reset), not "never touched". - test_codex_provider_unit: upstream's new TestCodexV0136FooterFormat tests patched codex.tmux_client, which this branch migrated to get_backend. Switch the three get_status tests to get_backend; drop the unused patch on the script-only extraction test.
…process import launch.py migrated tmux attach from subprocess.run(["tmux", "attach-session"...]) to get_backend().attach_session(), so the module no longer imports subprocess. Six launch tests still patched cli.commands.launch.subprocess.run as a defensive no-op, which now fails with "module launch has no attribute 'subprocess'" during patch setup. Repoint those patches at launch.get_backend, which exists and is the correct seam for the attach path. Full CI-equivalent suite (test/, minus integration/e2e) now passes: 2120 passed.
test_returns_herdr_when_configured constructed a real HerdrBackend, whose __init__ calls _ensure_session_running(). When the session socket is absent that method runs subprocess.Popen(["herdr", ...]), which raises FileNotFoundError on hosts where herdr is not installed (CI). The test only passed locally because a live ~/.config/herdr/sessions/cao/herdr.sock made _ensure_session_running short-circuit before the Popen call. Patch herdr_backend.os.path.exists to True so __init__ finds the socket and skips server startup, matching the existing fixture in test_herdr_backend.py. The factory's own config_path.exists() is pathlib.Path.exists and is unaffected by this module-scoped patch. Reproduced the CI failure locally by running with herdr off PATH; all 7 factory tests and the full 101-test backends suite pass after the fix.
The herdr backend setup guide (docs/herdr.md) was added in this branch but nothing in the README pointed to it, so it was undiscoverable. Add a link in the "Sessions run in tmux" section, next to the existing docs/tmux.md link, since herdr is the alternative terminal backend.
Four findings from the automated reviewer, verified against the code: - terminal_service.get_output: the "partial response" message claimed the marker was not found in 5000 lines, but the final attempt captures the full scrollback (full_history=True). Reworded to "full scrollback" so the prefix matches what was actually searched. - HerdrBackend.get_history ignored strip_escapes, while TmuxBackend honors it. The one strip_escapes=True caller (delete-terminal snapshot) was only getting plain text by accident because herdr defaults to text output. Map strip_escapes=True to herdr's native "--format text"; leave the default path untouched so existing provider output parsing on the herdr backend is unchanged. Added two unit tests for the flag mapping. - docs/herdr.md prerequisites said the herdr server must be running before CAO starts, but HerdrBackend._ensure_session_running auto-starts it when the session socket is absent. Clarified that pre-starting is optional. - test_kiro_cli_integration use_tmux_backend fixture set the module-level backend registry singleton without restoring it, leaking the tmux pin into later tests by execution order. Save and restore the previous backend. Not changed: the Copilot note about TmuxBackend.attach_session using check=True. The sole caller (cli/commands/launch.py) already wraps the call in a broad except that converts any exception to a click.ClickException, so a failed attach surfaces as a normal CLI error, not an uncaught stack trace. Backends + terminal_service suites pass (146 tests) with herdr off PATH.
… per connection HerdrInboxService flapped the herdr socket ~44x/min indefinitely: connect, subscribe, herdr resets the connection ~225ms later, 1s backoff, repeat. The event-driven inbox path delivered nothing as a result (every pane.agent_status_changed was lost between resets); delivery only worked via the polling fallback, and the churn produced ~50MB log files per day. Root cause: herdr 0.6.8 resets the entire connection when it receives a SECOND events.subscribe on a connection that already has an active subscription. _socket_loop sent one events.subscribe per managed pane (_resubscribe_all) and then a second for the lifecycle events (_subscribe_lifecycle_events), so the 2nd call always tore down the connection. The backoff never escalated past the 1.0s base because it reset after each apparently-successful setup. Reproduced and fix validated against a live herdr 0.6.8 socket: a single events.subscribe carrying every pane's agent-status plus pane.closed and workspace.closed is acknowledged once and keeps the connection open with events streaming; any second subscribe resets at ~150-220ms. Changes: - Replace _resubscribe_all + _subscribe_lifecycle_events + _subscribe_pane with a single _subscribe_all_events that builds one subscriptions array (one pane.agent_status_changed per managed pane -- pane_id is required, herdr rejects the wildcard form -- plus the two lifecycle events). - register_terminal on a live connection now forces a reconnect via _force_reconnect (close the writer so _socket_loop rebuilds the combined subscription) instead of sending a second, connection-killing subscribe. herdr exposes no incremental add-subscription API. - Update tests to pin the single-subscribe-per-connection contract.
Socket-driven pane.closed / workspace.closed cleanup never ran. _event_loop
read the event name from the "type" key and matched the dotted names
"pane.closed" / "workspace.closed", but herdr 0.6.8 puts the name in the
"event" key using underscore names ("pane_closed" / "workspace_closed"). The
condition never matched, so every lifecycle event fell through and real-time
cleanup was dead -- leaving the periodic reconcile/startup sweeps as the only
thing removing ghost terminals.
Captured the real wire format live from herdr 0.6.8:
{"event":"pane_closed","data":{"pane_id":...,"workspace_id":...}}
{"event":"workspace_closed","data":{"workspace_id":...}}
{"event":"pane.agent_status_changed","data":{"agent_status":...,"pane_id":...}}
Fix: read the name from "event" (falling back to "type") and normalize
underscores to dots before routing. The agent-status path was already correct
(it reads pane_id/agent_status from the "data" wrapper and does not depend on
the top-level name match).
Validated live end-to-end: the real _event_loop over the real herdr socket now
routes an actual pane close to _handle_lifecycle_event. Updated the routing
test to use herdr's real event shape (the old test used a fabricated
{"type":"pane.closed"} shape that never occurs on the wire) and added
end-to-end pane_closed cleanup and agent-status delivery tests.
Reviewer noted the README starts the server with `cao-server` (no `uv run` prefix), but herdr.md used `uv run cao-server`. cao-server is a project.scripts entry point, so the bare form works once installed. Aligned both occurrences (Launching + troubleshooting) with the README convention.
The rebase onto upstream/main pulled in four upstream tests in test_terminal_service_full.py that still patched terminal_service.tmux_client. This branch migrated terminal_service off the module-level tmux_client to get_backend(), so that attribute no longer exists and the patch setup raised AttributeError. Repointed the four @patch targets at backends.registry._backend (the seam the already-migrated sibling tests in this file use). The mock's method surface (session_exists, create_session, ...) is identical on the backend, so the test bodies are unchanged. Full CI-equivalent suite: 2180 passed, 1 skipped.
7ad426b to
6d2b410
Compare
Two findings from maintainer review of PR awslabs#271. P1: HerdrBackend dropped cao launch --env. create_session() and create_window() accepted extra_env but called _inject_env_vars() without it, so operator-forwarded vars (AWS_REGION, MNEMOSYNE_DIR, ...) worked under tmux and silently vanished under herdr. Thread extra_env through to the injection path and emit shell `export` statements for it. Reuse TmuxClient's filtering policy (blocked provider prefixes, 2048-byte value cap) so the two backends cannot drift. herdr injects via `pane send-text` (a shell command line, unlike tmux's exec-style -e KEY=VALUE argv), so operator-supplied values are passed through shlex.quote to stay injection-safe. Added tests for both create paths covering forwarding, blocked-prefix/oversize filtering, and value quoting. P2: workspace.closed map cleanup pruned via pane_id.startswith(workspace_id). herdr renumbers compact pane_ids and gives no guarantee they begin with the workspace_id, so the prefix test was unreliable. Prune by matching each tracked terminal's DB session against the closed workspace's session instead, mirroring the session match already used in the pane.closed handler. Updated the test to use compact pane_ids that do not share the workspace_id prefix, proving the prune keys off DB session ownership. Full CI-equivalent suite: 2184 passed, 1 skipped.
|
Thanks for the detailed review. Both findings are fixed in 942a050, and the merge conflict with main is resolved (rebased onto the latest main; the PR is mergeable again). P1 -- extra_env dropped under herdrConfirmed and fixed. To avoid the two backends drifting, I reused TmuxClient's existing policy directly rather than re-implementing it: Tests added for both create paths: forwarded vars are exported, blocked-prefix and oversized values are dropped (tmux parity), and a value with shell metacharacters is quoted. P2 -- workspace.closed prefix assumptionAgreed, the CI-equivalent suite locally: 2184 passed, 1 skipped. Pushing now; will confirm the GitHub checks go green. |
Adds a --terminal {tmux,herdr} flag to cao-server so the backend can be
selected at launch without editing config.json, addressing reviewer feedback
on PR awslabs#271.
- BackendFactory.create() gains a backend_override parameter that takes
precedence over terminal_backend in config.json. Other config keys
(herdr_session) are still read, so --terminal herdr honors a configured
session name.
- main() parses --terminal and, when set, resolves the backend via
BackendFactory and set_backend() before uvicorn starts. The backend
singleton is otherwise lazy (first touched in the lifespan), so setting it
up front cleanly controls every get_backend() consumer. Without the flag,
behavior is unchanged: the factory reads config.json on first use.
Precedence: --terminal flag > config.json terminal_backend > tmux default.
Tests: factory override precedence (over tmux/herdr config, no-config, session
passthrough, unknown name); main() sets the backend when --terminal is present
and leaves it lazy when absent. docs/herdr.md documents the flag.
Full CI-equivalent suite: 2191 passed, 1 skipped.
haofeif
left a comment
There was a problem hiding this comment.
Thanks @anilkmr-a2z as always, LGTM!
…awslabs#273) Reconcile the two concurrent refactors of the same providers/services: PR awslabs#271 added a terminal backend abstraction (tmux/herdr) with polling get_status; this branch (awslabs#273) replaced polling with an event-driven FIFO -> EventBus -> StatusMonitor/LogWriter/InboxService pipeline and async, buffer-based provider status detection. Resolution keeps the event-driven design as primary and layers in the backend abstraction: - Route terminal I/O through get_backend() (TmuxBackend delegates to tmux_client, so the default path is behaviorally unchanged) while keeping the FIFO pipe-pane stream as the tmux event source. - Guard FIFO setup in create_terminal with `not get_backend().supports_event_inbox()`; herdr uses its socket-event inbox. Keep herdr inbox registration additively (no-op for tmux). - Providers keep async initialize() and buffer-based get_status(output); claude_code keeps its newest-TUI parser plus herdr's native-status short-circuit (the get_history poll inside get_status is removed). - utils.terminal wait_for_shell/wait_until_status stay async + terminal_id based (StatusMonitor-driven). - API lifespan starts the EventBus consumers unconditionally and the herdr inbox service only for HerdrBackend; the removed watchdog PollingObserver branch is dropped. - inbox_service.deliver_pending keeps the event-driven readiness gate and resets herdr TerminalNotFoundError deliveries to PENDING for retry. - get_output(FULL) returns the StatusMonitor buffer; get_output(LAST) keeps awslabs#271's escalating fetch via get_backend(). Also: add submit_delay to the backend send_keys interface (Claude paste timing); update awslabs#271 net-new tests for async create_terminal / the rewired lifespan / the send_keys signature; fix pre-existing herdr mypy issues. Full unit suite green (2198 passed, 17 skipped); mypy, black and isort clean.
Summary
Adds herdr as a first-class terminal backend alongside the existing tmux backend.
What is herdr?
herdr is a terminal-native agent runtime and multiplexer designed specifically for AI coding agents. Unlike tmux, which CAO
polls via
capture-paneand regex to detect idle state, herdr exposes a Unix socket (JSON-RPC) that emits real-time agent status events(
working,idle,done,blocked). This eliminates polling entirely for inbox delivery: CAO subscribes to pane events and delivers messagesthe instant an agent goes idle.
herdr also tracks agent session identity natively (each pane knows its Claude session UUID), enabling accurate status reporting without heuristic
output analysis.
Changes
Core additions:
backends/abstraction layer --TerminalBackendABC,TmuxBackendwrapper,HerdrBackendimplementation,BackendFactory(readsterminal_backendfrom~/.aws/cli-agent-orchestrator/config.json), singleton registry. All services depend only on the ABC -- no concretebackend references outside
backends/.HerdrBackend-- full tmux-equivalent over herdr CLI: workspace/tab lifecycle, live label-based pane resolution (never caches stale pane IDs),pane output streaming, native
agent_statuspolling viaherdr pane get.HerdrInboxService-- event-driven inbox delivery via herdr Unix socket. Subscribes per-pane on registration; delivers onidle/donestatusevents. Includes exponential backoff reconnect with reconciliation on reconnect.
inbox_service.py) -- providers that buffer input during processing (e.g. claude_code) receive pending messagesimmediately on
send_input(), not at the next IDLE poll. Benefits both backends.docs/herdr.md-- setup, configuration, and troubleshooting guide.Reliability fixes (found during local testing):
_socket_loop()previously reset backoff right after TCP connect._resubscribe_all()with stale pane IDs causedECONNRESET, triggering a tight reconnect loop (106K socket events/day observed). Fixed: run
_reconcile()before_resubscribe_all(), resetbackoff only after full setup completes.
pane.closedandworkspace.closedglobally; real-time map + DB cleanup eliminates ghost entries thatpreviously caused "session already exists" errors on re-launch.
_startup_db_cleanup()runs unconditionally at server start, before any pane registration. Cross-checks all DB terminalsagainst live herdr tabs and removes ghost records from prior server runs. Fixes
cao session listsilently dropping sessions when their oldestDB terminal is a ghost (returns 500, caught, session dropped).
GET /terminals/{id}--TerminalNotFoundErrorfrom the backend now returns 404 instead of 500.Test coverage: 101 new tests across
test/backends/,test_lifespan_inbox.py,test_herdr_inbox_registry.py, andtest_inbox_service.py.Test plan
uv run pytest --no-cov test/backends/ -v-- all backend tests passuv run pytest --no-cov test/api/test_lifespan_inbox.py -v-- inbox lifespan tests passuv run pytest --no-cov test/services/test_inbox_service.py -v-- inbox service tests passterminal_backend: "herdr"in~/.aws/cli-agent-orchestrator/config.json, launch a session withcao launch --agents developer,verify terminals appear in
cao session listStartup DB cleanup: removed N ghost terminal(s)and session listdisplays correctly