Skip to content

feat(herdr): add herdr terminal backend with event-driven inbox delivery#271

Merged
haofeif merged 21 commits into
awslabs:mainfrom
anilkmr-a2z:feat/herdr-inbox-wiring
Jun 7, 2026
Merged

feat(herdr): add herdr terminal backend with event-driven inbox delivery#271
haofeif merged 21 commits into
awslabs:mainfrom
anilkmr-a2z:feat/herdr-inbox-wiring

Conversation

@anilkmr-a2z

Copy link
Copy Markdown
Contributor

Summary

Adds herdr as a first-class terminal backend alongside the existing tmux backend.

What is herdr?

herdr is a terminal-native agent runtime and multiplexer designed specifically for AI coding agents. Unlike tmux, which CAO
polls via capture-pane and regex to detect idle state, herdr exposes a Unix socket (JSON-RPC) that emits real-time agent status events
(working, idle, done, blocked). This eliminates polling entirely for inbox delivery: CAO subscribes to pane events and delivers messages
the instant an agent goes idle.

herdr also tracks agent session identity natively (each pane knows its Claude session UUID), enabling accurate status reporting without heuristic
output analysis.

Changes

Core additions:

  • backends/ abstraction layer -- TerminalBackend ABC, TmuxBackend wrapper, HerdrBackend implementation, BackendFactory (reads
    terminal_backend from ~/.aws/cli-agent-orchestrator/config.json), singleton registry. All services depend only on the ABC -- no concrete
    backend references outside backends/.
  • HerdrBackend -- full tmux-equivalent over herdr CLI: workspace/tab lifecycle, live label-based pane resolution (never caches stale pane IDs),
    pane output streaming, native agent_status polling via herdr pane get.
  • HerdrInboxService -- event-driven inbox delivery via herdr Unix socket. Subscribes per-pane on registration; delivers on idle/done status
    events. Includes exponential backoff reconnect with reconciliation on reconnect.
  • Eager inbox delivery (inbox_service.py) -- providers that buffer input during processing (e.g. claude_code) receive pending messages
    immediately on send_input(), not at the next IDLE poll. Benefits both backends.
  • docs/herdr.md -- setup, configuration, and troubleshooting guide.

Reliability fixes (found during local testing):

  • Reconnect storm fix -- _socket_loop() previously reset backoff right after TCP connect. _resubscribe_all() with stale pane IDs caused
    ECONNRESET, triggering a tight reconnect loop (106K socket events/day observed). Fixed: run _reconcile() before _resubscribe_all(), reset
    backoff only after full setup completes.
  • Lifecycle events -- subscribes to pane.closed and workspace.closed globally; real-time map + DB cleanup eliminates ghost entries that
    previously caused "session already exists" errors on re-launch.
  • Startup DB cleanup -- _startup_db_cleanup() runs unconditionally at server start, before any pane registration. Cross-checks all DB terminals
    against live herdr tabs and removes ghost records from prior server runs. Fixes cao session list silently dropping sessions when their oldest
    DB terminal is a ghost (returns 500, caught, session dropped).
  • GET /terminals/{id} -- TerminalNotFoundError from the backend now returns 404 instead of 500.

Test coverage: 101 new tests across test/backends/, test_lifespan_inbox.py, test_herdr_inbox_registry.py, and test_inbox_service.py.

Test plan

  • uv run pytest --no-cov test/backends/ -v -- all backend tests pass
  • uv run pytest --no-cov test/api/test_lifespan_inbox.py -v -- inbox lifespan tests pass
  • uv run pytest --no-cov test/services/test_inbox_service.py -v -- inbox service tests pass
  • Set terminal_backend: "herdr" in ~/.aws/cli-agent-orchestrator/config.json, launch a session with cao launch --agents developer,
    verify terminals appear in cao session list
  • Restart cao-server with ghost DB records present; startup log shows Startup DB cleanup: removed N ghost terminal(s) and session list
    displays correctly

@codecov-commenter

codecov-commenter commented Jun 3, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 82.75862% with 180 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@1224cd0). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...c/cli_agent_orchestrator/backends/herdr_backend.py 75.21% 86 Missing ⚠️
...agent_orchestrator/services/herdr_inbox_service.py 76.00% 78 Missing ⚠️
...li_agent_orchestrator/services/terminal_service.py 87.30% 8 Missing ⚠️
src/cli_agent_orchestrator/backends/base.py 93.18% 3 Missing ⚠️
src/cli_agent_orchestrator/api/main.py 95.45% 2 Missing ⚠️
...rc/cli_agent_orchestrator/backends/tmux_backend.py 95.65% 2 Missing ⚠️
src/cli_agent_orchestrator/backends/registry.py 90.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #271   +/-   ##
=======================================
  Coverage        ?   90.97%           
=======================================
  Files           ?       78           
  Lines           ?     8023           
  Branches        ?        0           
=======================================
  Hits            ?     7299           
  Misses          ?      724           
  Partials        ?        0           
Flag Coverage Δ
unittests 90.97% <82.75%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a pluggable terminal-backend abstraction and introduces a new herdr backend with event-driven inbox delivery, reducing reliance on tmux polling/log-watching and enabling native agent-status awareness where available.

Changes:

  • Introduces TerminalBackend + registry/factory and migrates services/providers/CLI commands to call get_backend() instead of directly using tmux_client.
  • Adds HerdrBackend and HerdrInboxService for socket-event-driven inbox delivery, plus lifecycle cleanup/reconnect behavior.
  • Expands/updates unit and integration tests to cover backend selection, inbox wiring, and updated delivery semantics (including eager delivery changes).

Reviewed changes

Copilot reviewed 59 out of 60 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
test/services/test_terminal_service.py Updates unit tests to patch the backend registry instead of tmux_client.
test/services/test_terminal_service_inbox_registration.py New tests for create/delete terminal wiring to herdr inbox registration/unregistration.
test/services/test_terminal_service_full.py Updates full terminal_service tests for backend abstraction and adds get_output escalation tests.
test/services/test_terminal_service_coverage.py Updates cleanup-path coverage tests to use the backend abstraction.
test/services/test_session_service.py Updates session_service tests to mock get_backend() instead of tmux_client.
test/services/test_plugin_event_emission.py Updates plugin event emission tests for backend abstraction + delete_session behavior change.
test/services/test_inbox_service.py Updates inbox tests for eager-delivery semantics and transient pane-resolution failures.
test/services/test_herdr_inbox_registry.py New tests for the herdr inbox service registry singleton + TYPE_CHECKING import guard.
test/services/test_flow_service.py Updates flow_service tests to use get_backend() instead of tmux_client.
test/providers/test_permission_prompt_detection.py Updates Kiro permission prompt detection tests to patch get_backend().
test/providers/test_opencode_cli_unit.py Updates OpenCode provider unit tests to patch get_backend().
test/providers/test_kiro_cli_integration.py Pins backend registry to tmux for integration tests on hosts configured for herdr.
test/providers/test_kimi_cli_unit.py Updates Kimi provider unit tests to patch get_backend().
test/providers/test_gemini_cli_unit.py Updates Gemini provider unit tests to patch get_backend().
test/providers/test_copilot_cli_unit.py Updates Copilot CLI provider unit tests to patch get_backend().
test/providers/test_claude_code_coverage.py Updates Claude Code coverage tests to patch backend registry and reflect startup-prompt handling changes.
test/clients/test_database.py Adds coverage for create_inbox_message receiver-terminal existence validation.
test/cli/commands/test_terminal.py Updates CLI terminal command tests to use get_backend() instead of tmux_client.
test/cli/commands/test_launch.py Updates launch tests to expect backend attach behavior (not tmux attach subprocess).
test/backends/test_tmux_backend.py New tests ensuring TmuxBackend satisfies the TerminalBackend ABC and delegates properly.
test/backends/test_factory.py New tests for config-driven backend selection via BackendFactory.
test/backends/init.py Adds/keeps test package marker for backend tests.
test/backends/test_herdr_backend.py Adds unit tests for HerdrBackend behavior and contract compliance.
test/backends/test_herdr_inbox_service.py Adds unit tests for socket-event inbox delivery behavior and reconnection/lifecycle handling.
test/api/test_plugin_lifespan.py Updates lifespan tests to patch get_backend() during app startup.
test/api/test_lifespan_inbox.py New tests verifying lifespan wiring selects HerdrInboxService vs PollingObserver based on backend.
test/api/test_api_endpoints.py Updates /health test expectations and lifespan tests to patch get_backend().
src/cli_agent_orchestrator/utils/terminal.py Updates wait_for_shell to accept a TerminalBackend instead of TmuxClient.
src/cli_agent_orchestrator/services/terminal_service.py Migrates terminal operations to backend abstraction; adds herdr inbox registration; improves get_output extraction logic.
src/cli_agent_orchestrator/services/session_service.py Migrates session operations to backend abstraction; changes delete_session to tolerate backend session already gone.
src/cli_agent_orchestrator/services/inbox_service.py Adjusts eager delivery gating and handles TerminalNotFoundError as transient (reset message to PENDING).
src/cli_agent_orchestrator/services/herdr_inbox_registry.py New module-level singleton registry for the running HerdrInboxService instance.
src/cli_agent_orchestrator/services/herdr_inbox_service.py New event-driven inbox delivery service using herdr Unix socket events + reconciliation/cleanup.
src/cli_agent_orchestrator/services/flow_service.py Migrates flow session existence/kill operations to backend abstraction.
src/cli_agent_orchestrator/providers/q_cli.py Migrates provider terminal I/O to backend abstraction.
src/cli_agent_orchestrator/providers/opencode_cli.py Migrates provider terminal I/O to backend abstraction.
src/cli_agent_orchestrator/providers/kiro_cli.py Migrates provider terminal I/O to backend abstraction.
src/cli_agent_orchestrator/providers/kimi_cli.py Migrates provider terminal I/O to backend abstraction.
src/cli_agent_orchestrator/providers/gemini_cli.py Migrates provider terminal I/O to backend abstraction.
src/cli_agent_orchestrator/providers/copilot_cli.py Migrates provider terminal I/O to backend abstraction.
src/cli_agent_orchestrator/providers/codex.py Migrates provider terminal I/O to backend abstraction and simplifies trust-prompt acceptance.
src/cli_agent_orchestrator/providers/claude_code.py Adds herdr-native status path + dispatch tracking; updates response marker parsing; migrates I/O to backend abstraction.
src/cli_agent_orchestrator/providers/base.py Updates eager-delivery documentation to remove WAITING_USER_ANSWER from eligible states.
src/cli_agent_orchestrator/constants.py Updates eager inbox delivery comment to reflect PROCESSING-only eligibility.
src/cli_agent_orchestrator/clients/database.py Validates receiver terminal existence in create_inbox_message.
src/cli_agent_orchestrator/cli/commands/terminal.py Migrates restore window creation to backend abstraction.
src/cli_agent_orchestrator/cli/commands/launch.py Migrates session attach to get_backend().attach_session() instead of tmux attach subprocess.
src/cli_agent_orchestrator/backends/base.py New TerminalBackend ABC + shared backend exceptions + capability hooks (native status, pane id).
src/cli_agent_orchestrator/backends/tmux_backend.py New TmuxBackend adapter wrapping existing TmuxClient.
src/cli_agent_orchestrator/backends/herdr_backend.py New HerdrBackend implementing TerminalBackend via herdr CLI (pane resolution, native status, attach, etc.).
src/cli_agent_orchestrator/backends/registry.py New module-level backend singleton registry (get_backend/set_backend).
src/cli_agent_orchestrator/backends/factory.py New config-driven backend factory (defaults to tmux; opt-in herdr).
src/cli_agent_orchestrator/backends/init.py Exposes backend public API surface (ABC, factory, exceptions).
src/cli_agent_orchestrator/api/main.py Lifespan selects herdr socket inbox vs tmux polling observer; expands /health component probes; maps TerminalNotFoundError to 404.
docs/herdr.md New documentation describing herdr backend configuration, behavior, and troubleshooting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/cli_agent_orchestrator/services/terminal_service.py Outdated
Comment thread src/cli_agent_orchestrator/backends/tmux_backend.py
Comment thread src/cli_agent_orchestrator/backends/herdr_backend.py
Comment thread test/providers/test_kiro_cli_integration.py
Comment thread docs/herdr.md

@haofeif haofeif left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got a couple comments here

Comment thread docs/herdr.md Outdated
Start the CAO server the same way as with tmux -- no additional flags needed:

```bash
uv run cao-server

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently in root README.md, we only need to run cao server, without the "uv run", can we do cao server instead without running uv run ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed in 7ad426b -- dropped uv run from both occurrences in this doc (Launching + troubleshooting) to match the README, since cao-server is a project.scripts entry point and works bare once installed.

Comment thread docs/herdr.md

## Configuration

Set `terminal_backend` in `~/.aws/cli-agent-orchestrator/config.json`:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think we can do something like cao server --terminal herdr to start on herdr instead of need to write a config ?

@anilkmr-a2z anilkmr-a2z Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea, but I'd prefer to keep it out of this PR. Today the backend is resolved by BackendFactory from config.json through the get_backend() singleton, which is read lazily on first use (including inside the FastAPI lifespan). A --terminal flag means parsing it in api/main.py:main(), calling set_backend(BackendFactory.create(...)) before the lifespan runs, and defining precedence vs the config.json value. That's a reasonable UX improvement but it's a separate behavior change from the inbox wiring this PR is scoped to. I will see if I can track this as a follow up.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to add this. Implemented in 2b7b402.

cao-server --terminal {tmux,herdr} now overrides terminal_backend from config.json for that run, so you can start on herdr without writing a config file:

cao-server --terminal herdr

Precedence is flag > config.json > tmux default. herdr_session is still read from config.json, so the flag only selects the backend. Implementation threads a backend_override through BackendFactory and calls set_backend() before uvicorn starts; since the backend singleton is otherwise lazy (first resolved in the lifespan), nothing else changes when the flag is absent. Added factory precedence tests and main()-wiring tests, and documented it in docs/herdr.md.

Note the entry point is cao-server (the project.scripts console script), so it's cao-server --terminal herdr rather than cao server.

@haofeif

haofeif commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

@anilkmr-a2z This PR is a substantial backend abstraction and a useful Herdr integration. Had the following two findings, also please kindly resolve the merge conflict ? after this i think we are ready to merge. Very excited in bringing herdr into the terminal option!

Findings

P1. Herdr backend ignores extra_env, breaking cao launch --env

Files:

  • src/cli_agent_orchestrator/backends/herdr_backend.py:181
  • src/cli_agent_orchestrator/backends/herdr_backend.py:222
  • src/cli_agent_orchestrator/backends/herdr_backend.py:274
  • src/cli_agent_orchestrator/backends/herdr_backend.py:301
  • src/cli_agent_orchestrator/services/terminal_service.py:181
  • src/cli_agent_orchestrator/services/terminal_service.py:200

TerminalBackend.create_session() and create_window() now expose extra_env, and terminal_service.create_terminal() passes the operator-provided environment through:

  • new sessions pass extra_env=env_vars
  • child terminals pass extra_env=get_session_env(session_name)

The tmux backend forwards those values to TmuxClient, which merges them into the pane environment. Herdr accepts the same parameter in both method signatures, but never uses it:

  • create_session(..., extra_env=...) calls _inject_env_vars(session_name, window_name, terminal_id, pane_id=new_pane_id)
  • create_window(..., extra_env=...) calls _inject_env_vars(session_name, window_name, terminal_id, pane_id=new_pane_id)
  • _inject_env_vars() only exports CAO_TERMINAL_ID and CAO_SESSION_NAME

That silently breaks the v2.2 cao launch --env contract for Herdr-backed sessions. Anything relying on explicitly forwarded context, such as AWS_REGION, MNEMOSYNE_DIR, or other allowed launch-time variables, will work under tmux and disappear under Herdr.

Recommended fix: Thread extra_env into Herdr's environment injection path and apply the same filtering rules as TmuxClient._merge_extra_env() before sending exports to the pane. Add tests for both create_session() and create_window() asserting forwarded env vars are sent to Herdr, while blocked prefixes and oversized values are dropped consistently with tmux.

P2. workspace.closed cleanup relies on an unverified pane-id prefix contract

Files:

  • src/cli_agent_orchestrator/services/herdr_inbox_service.py:555
  • src/cli_agent_orchestrator/services/herdr_inbox_service.py:569
  • test/backends/test_herdr_inbox_service.py:684

On workspace.closed, the service deletes DB terminals for the session, then prunes in-memory maps by checking:

if pid.startswith(workspace_id)

The unit test fixture uses pane IDs such as ws-abc/0, so the test proves the current implementation under that assumption. I did not find a Herdr contract in this PR proving pane IDs must always begin with the workspace ID. Other PR comments and tests describe Herdr pane IDs as compact and potentially renumbered, which makes this worth tightening before relying on it operationally.

If Herdr guarantees the prefix format, document that where the cleanup depends on it. If not, track terminal_id -> workspace_id during registration or reconciliation and prune from that explicit mapping instead of deriving workspace ownership from the pane ID string.

I re-checked this against 9fce586. The main blocker I can confirm is that HerdrBackend drops extra_env. terminal_service.create_terminal() now passes launch env through the backend contract for both new sessions and child windows, and TmuxBackend preserves that behavior via TmuxClient._merge_extra_env(). Herdr accepts the same extra_env parameter in create_session() / create_window(), but both paths call _inject_env_vars() without passing it, and _inject_env_vars() only exports CAO_TERMINAL_ID and CAO_SESSION_NAME.

That means cao launch --env ... works under tmux but silently disappears under Herdr. Please thread extra_env into the Herdr injection path, apply the same filtering rules as tmux, and add tests for new-session and child-window creation.

One smaller follow-up: workspace.closed cleanup prunes maps with pane_id.startswith(workspace_id). The test fixture assumes pane IDs like ws-abc/0; if Herdr guarantees that format, please document it near the cleanup logic. Otherwise, track workspace ownership explicitly during registration/reconcile and prune from that mapping.

…rvice

Implements the herdr-backend-support OpenSpec change: TerminalBackend ABC,
TmuxBackend wrapper, HerdrBackend (CLI + socket API), BackendFactory with
config-driven selection, native status detection, and HerdrInboxService
(socket event-based inbox delivery, not yet wired into the app lifespan).

Parks work-in-progress ahead of the fix-herdr-inbox-delivery-wiring change,
which wires HerdrInboxService into the running app and fixes the eager
delivery modal-injection defect.
Two bugs prevented inbox message delivery, plus the supporting wiring
to make herdr-backed delivery work end to end.

Bug A — eager delivery eligibility gate (inbox_service.py):
WAITING_USER_ANSWER was incorrectly included in the eager delivery
eligibility check. A terminal waiting on a user answer is not buffering
input for a next turn, so delivering eagerly in that state is wrong.
Restrict eager eligibility to PROCESSING only.

Bug B — HerdrInboxService never wired into lifespan (api/main.py):
Under the herdr backend the service was never instantiated or started,
so zero messages were delivered. The lifespan now selects the delivery
mechanism by backend: HerdrInboxService (socket events) for HerdrBackend,
PollingObserver (file polling) otherwise, with matching shutdown for each.

Supporting wiring:
- Add HerdrInboxRegistry (module-level get/set singleton) with a
  TYPE_CHECKING-guarded import to break the circular import chain
  between terminal_service and herdr_inbox_service.
- Connect terminal create/delete to register/unregister the terminal
  with the herdr inbox service; failures are logged, not fatal.
- Expose HerdrBackend.herdr_session for the service to bind its socket.
- Fix the async seam in HerdrInboxService: capture the running loop in
  start() and use run_coroutine_threadsafe() for pane subscription, since
  register_terminal() is called from the synchronous create_terminal path.
- Clear the HerdrInboxService singleton on shutdown (set_herdr_inbox_service
  now accepts None) so the registry does not retain a stale reference after
  the service task is cancelled, keeping startup and shutdown symmetric.

Update providers/base.py and constants.py docs to match the corrected
PROCESSING-only eager delivery semantics.

Tests: 66 passed across the five affected suites.

Refs: none (internal herdr bug fix)

🤖 Assisted by Amazon Q Developer
…cache

Fixes silent message misdelivery after a CAO server restart. The window
resolution path keyed off an in-memory _window_to_terminal map populated only
at create time and never rebuilt; after a restart it was empty, and resolution
fell through to a fallback that returned the FIRST pane in the workspace,
misrouting messages to the wrong terminal (marked delivered).

Resolve from herdr's live state every call instead: workspace label (= session)
-> tab label (= window) -> pane on that tab. herdr is a separate process that
survives CAO restarts, so this is restart-safe with no new persistence. Window
names are unique by construction (profile + uuid suffix).

- _resolve_pane_id_from_window: always resolve via the label chain; wrap backend
  errors as TerminalNotFoundError so the inbox path can leave messages pending
- remove _window_to_terminal entirely (declaration, populate, cleanup sites)
- remove the silent first-pane fallback; raise TerminalNotFoundError instead
- inbox_service: treat TerminalNotFoundError on send as transient -> leave the
  message pending for retry (not FAILED, not misrouted)
- _resolve_pane_id (socket inbox path) unchanged

Tests: test_herdr_backend (label-chain resolution, fail-loud) and
test_inbox_service (pending-on-resolution-failure) pass.

Implements OpenSpec change fix-herdr-pane-resolution-across-restart.
…vents

Root cause of 106K/day reconnect storm: _resubscribe_all() was sending
subscriptions for pane_ids that no longer exist in herdr. Herdr responds
to any invalid pane subscription by ECONNRESET-ing the entire socket.
Backoff never escalated because it was reset after _connect() success,
not after full setup.

Changes:
- Add _reconcile(): runs on every socket connect, before _resubscribe_all().
  Calls 'herdr pane list', prunes _pane_to_terminal of stale entries,
  deletes orphaned DB terminal records via delete_terminal(), kills
  workspaces with zero remaining live terminals. Builds
  _workspace_to_session map for lifecycle event handling.
- Add _subscribe_lifecycle_events(): subscribes to pane.closed and
  workspace.closed globally after resubscribe. No pane_id needed.
- Add _handle_lifecycle_event(): pane.closed deletes single terminal and
  kills workspace if now empty. workspace.closed deletes all terminals
  for that session and prunes maps using pane_id prefix matching.
- Fix backoff reset: moved to after successful reconcile + resubscribe +
  lifecycle-subscribe, not just _connect().
- Fix import: get_backend() is in backends.registry, not backends.__init__.
  Two call sites updated to match the pattern used everywhere else in src/.
- Fix double get_terminal_metadata() call in pane.closed handler: refactor
  to walrus-operator list comprehension, eliminating None dereference.
- Add 13 unit tests covering _reconcile prune/no-op/failure paths,
  _subscribe_lifecycle_events message format, and _handle_lifecycle_event
  for pane.closed and workspace.closed including unknown-entity no-ops.

🤖 Generated with Amazon Q Developer
…cile

_reconcile() previously only pruned panes it tracked in _pane_to_terminal.
On server restart the map starts empty, so ghost DB records from prior runs
(terminals whose herdr tabs no longer exist) were never cleaned up. This
caused repeated 'Terminal not found' errors and broke 'cao session list'
(CLI silently drops a session when its conductor terminal returns 500).

Fix in two parts:

1. herdr_inbox_service.py: after the in-memory map prune, run `herdr tab
   list` and cross-check DB terminals for every tracked workspace. Any DB
   terminal whose tmux_window label is absent from live herdr tabs is
   deleted. This runs on every socket connect, covering both startup and
   reconnect scenarios.

2. api/main.py: GET /terminals/{id} catches TerminalNotFoundError from the
   backend and returns 404 instead of 500. Ghost terminals that survive
   until the next server restart now fail with the correct status code,
   making CLI error handling straightforward.

10 new tests (33 total in test_herdr_inbox_service.py).
… reconnect

The DB cross-check in _reconcile() only fires when _pane_to_terminal is
non-empty (the socket loop is gated by that condition). On a fresh server
start with no registered terminals, _reconcile() never runs, so ghost records
from prior runs survive indefinitely.

Extract the workspace/tab cross-check into _startup_db_cleanup() and call it
unconditionally from start() before the socket loop begins. The method fetches
herdr workspace list and tab list directly, building its own workspace map
without depending on _pane_to_terminal or _workspace_to_session.

3 new tests (36 total).
Five test/impl fixes surfaced after rebasing onto upstream/main:

- /health endpoint: implement the `components` block (cao/herdr/claude) that
  test_health_check_returns_ok already asserted. herdr/claude probed via
  shutil.which -> "ok" or "unavailable".
- test_herdr_backend: fix infinite hang in test_logs_warning_when_socket_never_appears.
  The fake_time stub returned 6.0 but _ensure_session_running's poll deadline is
  now time()+15.0, so 6.0 < 15.0 looped forever. Bump stub to 16.0.
- test_tmux_backend: update create_session/create_window delegation assertions
  for the new extra_env kwarg.
- test_api_endpoints + test_plugin_lifespan: patch get_backend to a TmuxBackend
  mock so the lifespan PollingObserver assertions hold regardless of the local
  terminal_backend config.
- test_kiro_cli_unit: test_mcp_server_init_yields_processing patched the removed
  kiro_cli.tmux_client; switch to get_backend mock like the sibling tests.
…anges

After rebasing onto upstream/main, three integration points needed repair:

- inbox_service: upstream now marks a message DELIVERED *before* send_input
  (PR awslabs#265, closes the watchdog re-entrancy race). My TerminalNotFoundError
  handler returned without resetting that optimistic status, so a transient
  pane-resolution failure would strand the message as DELIVERED-but-unsent.
  Reset to PENDING in the handler so the retry path still fires.
- test_inbox_service: update test_resolution_failure_leaves_message_pending to
  assert the final status is PENDING (DELIVERED then reset), not "never touched".
- test_codex_provider_unit: upstream's new TestCodexV0136FooterFormat tests
  patched codex.tmux_client, which this branch migrated to get_backend. Switch
  the three get_status tests to get_backend; drop the unused patch on the
  script-only extraction test.
…process import

launch.py migrated tmux attach from subprocess.run(["tmux", "attach-session"...])
to get_backend().attach_session(), so the module no longer imports subprocess.
Six launch tests still patched cli.commands.launch.subprocess.run as a defensive
no-op, which now fails with "module launch has no attribute 'subprocess'" during
patch setup. Repoint those patches at launch.get_backend, which exists and is the
correct seam for the attach path.

Full CI-equivalent suite (test/, minus integration/e2e) now passes: 2120 passed.
test_returns_herdr_when_configured constructed a real HerdrBackend, whose
__init__ calls _ensure_session_running(). When the session socket is absent
that method runs subprocess.Popen(["herdr", ...]), which raises
FileNotFoundError on hosts where herdr is not installed (CI). The test only
passed locally because a live ~/.config/herdr/sessions/cao/herdr.sock made
_ensure_session_running short-circuit before the Popen call.

Patch herdr_backend.os.path.exists to True so __init__ finds the socket and
skips server startup, matching the existing fixture in test_herdr_backend.py.
The factory's own config_path.exists() is pathlib.Path.exists and is
unaffected by this module-scoped patch.

Reproduced the CI failure locally by running with herdr off PATH; all 7
factory tests and the full 101-test backends suite pass after the fix.
The herdr backend setup guide (docs/herdr.md) was added in this branch but
nothing in the README pointed to it, so it was undiscoverable. Add a link in
the "Sessions run in tmux" section, next to the existing docs/tmux.md link,
since herdr is the alternative terminal backend.
Four findings from the automated reviewer, verified against the code:

- terminal_service.get_output: the "partial response" message claimed the
  marker was not found in 5000 lines, but the final attempt captures the full
  scrollback (full_history=True). Reworded to "full scrollback" so the prefix
  matches what was actually searched.

- HerdrBackend.get_history ignored strip_escapes, while TmuxBackend honors it.
  The one strip_escapes=True caller (delete-terminal snapshot) was only getting
  plain text by accident because herdr defaults to text output. Map
  strip_escapes=True to herdr's native "--format text"; leave the default path
  untouched so existing provider output parsing on the herdr backend is
  unchanged. Added two unit tests for the flag mapping.

- docs/herdr.md prerequisites said the herdr server must be running before CAO
  starts, but HerdrBackend._ensure_session_running auto-starts it when the
  session socket is absent. Clarified that pre-starting is optional.

- test_kiro_cli_integration use_tmux_backend fixture set the module-level
  backend registry singleton without restoring it, leaking the tmux pin into
  later tests by execution order. Save and restore the previous backend.

Not changed: the Copilot note about TmuxBackend.attach_session using
check=True. The sole caller (cli/commands/launch.py) already wraps the call in
a broad except that converts any exception to a click.ClickException, so a
failed attach surfaces as a normal CLI error, not an uncaught stack trace.

Backends + terminal_service suites pass (146 tests) with herdr off PATH.
… per connection

HerdrInboxService flapped the herdr socket ~44x/min indefinitely: connect,
subscribe, herdr resets the connection ~225ms later, 1s backoff, repeat. The
event-driven inbox path delivered nothing as a result (every
pane.agent_status_changed was lost between resets); delivery only worked via
the polling fallback, and the churn produced ~50MB log files per day.

Root cause: herdr 0.6.8 resets the entire connection when it receives a SECOND
events.subscribe on a connection that already has an active subscription.
_socket_loop sent one events.subscribe per managed pane (_resubscribe_all) and
then a second for the lifecycle events (_subscribe_lifecycle_events), so the
2nd call always tore down the connection. The backoff never escalated past the
1.0s base because it reset after each apparently-successful setup.

Reproduced and fix validated against a live herdr 0.6.8 socket: a single
events.subscribe carrying every pane's agent-status plus pane.closed and
workspace.closed is acknowledged once and keeps the connection open with events
streaming; any second subscribe resets at ~150-220ms.

Changes:
- Replace _resubscribe_all + _subscribe_lifecycle_events + _subscribe_pane with
  a single _subscribe_all_events that builds one subscriptions array (one
  pane.agent_status_changed per managed pane -- pane_id is required, herdr
  rejects the wildcard form -- plus the two lifecycle events).
- register_terminal on a live connection now forces a reconnect via
  _force_reconnect (close the writer so _socket_loop rebuilds the combined
  subscription) instead of sending a second, connection-killing subscribe.
  herdr exposes no incremental add-subscription API.
- Update tests to pin the single-subscribe-per-connection contract.
Socket-driven pane.closed / workspace.closed cleanup never ran. _event_loop
read the event name from the "type" key and matched the dotted names
"pane.closed" / "workspace.closed", but herdr 0.6.8 puts the name in the
"event" key using underscore names ("pane_closed" / "workspace_closed"). The
condition never matched, so every lifecycle event fell through and real-time
cleanup was dead -- leaving the periodic reconcile/startup sweeps as the only
thing removing ghost terminals.

Captured the real wire format live from herdr 0.6.8:
  {"event":"pane_closed","data":{"pane_id":...,"workspace_id":...}}
  {"event":"workspace_closed","data":{"workspace_id":...}}
  {"event":"pane.agent_status_changed","data":{"agent_status":...,"pane_id":...}}

Fix: read the name from "event" (falling back to "type") and normalize
underscores to dots before routing. The agent-status path was already correct
(it reads pane_id/agent_status from the "data" wrapper and does not depend on
the top-level name match).

Validated live end-to-end: the real _event_loop over the real herdr socket now
routes an actual pane close to _handle_lifecycle_event. Updated the routing
test to use herdr's real event shape (the old test used a fabricated
{"type":"pane.closed"} shape that never occurs on the wire) and added
end-to-end pane_closed cleanup and agent-status delivery tests.
Reviewer noted the README starts the server with `cao-server` (no `uv run`
prefix), but herdr.md used `uv run cao-server`. cao-server is a project.scripts
entry point, so the bare form works once installed. Aligned both occurrences
(Launching + troubleshooting) with the README convention.
The rebase onto upstream/main pulled in four upstream tests in
test_terminal_service_full.py that still patched
terminal_service.tmux_client. This branch migrated terminal_service off the
module-level tmux_client to get_backend(), so that attribute no longer exists
and the patch setup raised AttributeError.

Repointed the four @patch targets at backends.registry._backend (the seam the
already-migrated sibling tests in this file use). The mock's method surface
(session_exists, create_session, ...) is identical on the backend, so the test
bodies are unchanged.

Full CI-equivalent suite: 2180 passed, 1 skipped.
@anilkmr-a2z anilkmr-a2z force-pushed the feat/herdr-inbox-wiring branch from 7ad426b to 6d2b410 Compare June 6, 2026 21:35
Two findings from maintainer review of PR awslabs#271.

P1: HerdrBackend dropped cao launch --env. create_session() and
create_window() accepted extra_env but called _inject_env_vars() without it,
so operator-forwarded vars (AWS_REGION, MNEMOSYNE_DIR, ...) worked under tmux
and silently vanished under herdr. Thread extra_env through to the injection
path and emit shell `export` statements for it. Reuse TmuxClient's filtering
policy (blocked provider prefixes, 2048-byte value cap) so the two backends
cannot drift. herdr injects via `pane send-text` (a shell command line, unlike
tmux's exec-style -e KEY=VALUE argv), so operator-supplied values are passed
through shlex.quote to stay injection-safe. Added tests for both create paths
covering forwarding, blocked-prefix/oversize filtering, and value quoting.

P2: workspace.closed map cleanup pruned via pane_id.startswith(workspace_id).
herdr renumbers compact pane_ids and gives no guarantee they begin with the
workspace_id, so the prefix test was unreliable. Prune by matching each tracked
terminal's DB session against the closed workspace's session instead, mirroring
the session match already used in the pane.closed handler. Updated the test to
use compact pane_ids that do not share the workspace_id prefix, proving the
prune keys off DB session ownership.

Full CI-equivalent suite: 2184 passed, 1 skipped.
@anilkmr-a2z

Copy link
Copy Markdown
Contributor Author

Thanks for the detailed review. Both findings are fixed in 942a050, and the merge conflict with main is resolved (rebased onto the latest main; the PR is mergeable again).

P1 -- extra_env dropped under herdr

Confirmed and fixed. create_session() and create_window() now thread extra_env into the injection path, and _inject_env_vars() emits export statements for the forwarded vars after the CAO identity vars.

To avoid the two backends drifting, I reused TmuxClient's existing policy directly rather than re-implementing it: _is_blocked_env_key() (blocked provider prefixes + allowlist) and _MAX_ENV_VALUE_BYTES (2048-byte cap). One herdr-specific difference: herdr injects via pane send-text, which is a shell command line, not tmux's exec-style -e KEY=VALUE argv. So operator-supplied values go through shlex.quote() to keep the shell export injection-safe. Keys are already constrained to [A-Za-z_][A-Za-z0-9_]* at the CLI boundary.

Tests added for both create paths: forwarded vars are exported, blocked-prefix and oversized values are dropped (tmux parity), and a value with shell metacharacters is quoted.

P2 -- workspace.closed prefix assumption

Agreed, the pane_id.startswith(workspace_id) test was unreliable -- I saw firsthand in this work that herdr compact pane_ids do not consistently carry the workspace_id prefix and get renumbered. Rather than add a new ownership map, I switched the prune to match each tracked terminal's DB session against the closed workspace's session, which is the same session-match the pane.closed handler already uses. Updated the test to use compact pane_ids that deliberately do not share the workspace_id prefix, so it now proves the prune keys off DB session ownership rather than the string shape.

CI-equivalent suite locally: 2184 passed, 1 skipped. Pushing now; will confirm the GitHub checks go green.

Adds a --terminal {tmux,herdr} flag to cao-server so the backend can be
selected at launch without editing config.json, addressing reviewer feedback
on PR awslabs#271.

- BackendFactory.create() gains a backend_override parameter that takes
  precedence over terminal_backend in config.json. Other config keys
  (herdr_session) are still read, so --terminal herdr honors a configured
  session name.
- main() parses --terminal and, when set, resolves the backend via
  BackendFactory and set_backend() before uvicorn starts. The backend
  singleton is otherwise lazy (first touched in the lifespan), so setting it
  up front cleanly controls every get_backend() consumer. Without the flag,
  behavior is unchanged: the factory reads config.json on first use.

Precedence: --terminal flag > config.json terminal_backend > tmux default.

Tests: factory override precedence (over tmux/herdr config, no-config, session
passthrough, unknown name); main() sets the backend when --terminal is present
and leaves it lazy when absent. docs/herdr.md documents the flag.

Full CI-equivalent suite: 2191 passed, 1 skipped.

@haofeif haofeif left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @anilkmr-a2z as always, LGTM!

@haofeif haofeif merged commit 7a7d1e1 into awslabs:main Jun 7, 2026
29 checks passed
call-me-ram added a commit to call-me-ram/cli-agent-orchestrator that referenced this pull request Jun 8, 2026
…awslabs#273)

Reconcile the two concurrent refactors of the same providers/services:
PR awslabs#271 added a terminal backend abstraction (tmux/herdr) with polling
get_status; this branch (awslabs#273) replaced polling with an event-driven
FIFO -> EventBus -> StatusMonitor/LogWriter/InboxService pipeline and
async, buffer-based provider status detection.

Resolution keeps the event-driven design as primary and layers in the
backend abstraction:

- Route terminal I/O through get_backend() (TmuxBackend delegates to
  tmux_client, so the default path is behaviorally unchanged) while
  keeping the FIFO pipe-pane stream as the tmux event source.
- Guard FIFO setup in create_terminal with
  `not get_backend().supports_event_inbox()`; herdr uses its socket-event
  inbox. Keep herdr inbox registration additively (no-op for tmux).
- Providers keep async initialize() and buffer-based get_status(output);
  claude_code keeps its newest-TUI parser plus herdr's native-status
  short-circuit (the get_history poll inside get_status is removed).
- utils.terminal wait_for_shell/wait_until_status stay async + terminal_id
  based (StatusMonitor-driven).
- API lifespan starts the EventBus consumers unconditionally and the herdr
  inbox service only for HerdrBackend; the removed watchdog PollingObserver
  branch is dropped.
- inbox_service.deliver_pending keeps the event-driven readiness gate and
  resets herdr TerminalNotFoundError deliveries to PENDING for retry.
- get_output(FULL) returns the StatusMonitor buffer; get_output(LAST)
  keeps awslabs#271's escalating fetch via get_backend().

Also: add submit_delay to the backend send_keys interface (Claude paste
timing); update awslabs#271 net-new tests for async create_terminal / the rewired
lifespan / the send_keys signature; fix pre-existing herdr mypy issues.

Full unit suite green (2198 passed, 17 skipped); mypy, black and isort clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants