Conversation
6 tasks
haofeif
added a commit
that referenced
this pull request
Apr 9, 2026
- Add "Kiro is working" ghost text as positive PROCESSING signal, checked before idle prompt absence (jwalaQ comment #1) - Add TUI permission pattern "Yes No Always Allow" alongside legacy [y/n/t] format, requires all three options to avoid false positives on bare "Yes"/"No" in agent output (jwalaQ comment #2) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
haofeif
added a commit
that referenced
this pull request
Apr 10, 2026
…159) (#163) * feat(kiro_cli): add full TUI mode support with --legacy-ui fallback (#159) Remove hardcoded --legacy-ui from launch command and add TUI-native status detection and message extraction. The provider now: - Launches in TUI mode by default, falls back to --legacy-ui on timeout - Detects COMPLETED via ▸ Credits: marker + idle prompt (TUI path) - Extracts messages using separator (────) boundaries when no green arrows - Retains full backward compatibility with legacy UI patterns Also adds "aren't available" to e2e REFUSAL_KEYWORDS for Claude Code test fix. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: apply black formatting to kiro_cli provider and tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(kiro_cli): address PR #163 review comments - Raise separator minimum from 4 to 20 chars to avoid matching short markdown separators in agent output (jwalaQ comment #4) - Remove redundant ANSI cleanup in _extract_tui_message — input is already ANSI-stripped by caller (jwalaQ comment #5) - Improve timeout error message wording (jwalaQ comment #3) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(kiro_cli): add TUI processing and permission detection patterns - Add "Kiro is working" ghost text as positive PROCESSING signal, checked before idle prompt absence (jwalaQ comment #1) - Add TUI permission pattern "Yes No Always Allow" alongside legacy [y/n/t] format, requires all three options to avoid false positives on bare "Yes"/"No" in agent output (jwalaQ comment #2) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(kiro_cli): update TUI idle pattern to match real kiro-cli v1.29+ output Verified against real kiro-cli v1.29.1 TUI output via tmux capture-pane: - Idle prompt is "Ask a question or describe a task" (capital A, no comma) - Pattern now accepts both old (lowercase, comma) and new formats - Updated fixtures to use real TUI output format - Updated inline test strings to match v1.29+ output Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(kiro_cli): use forward-search for TUI separator to handle agent output separators (#159) Changed _extract_tui_message() to find the first separator after the previous turn's Credits line instead of the last separator before the current Credits. This prevents false matches when agent output contains box-drawing separator characters. Also updated docs for launch command. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): add 'Kiro is working' ghost text to TUI processing fixture (#159) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This was referenced Apr 22, 2026
3 tasks
5 tasks
haofeif
added a commit
that referenced
this pull request
Jun 11, 2026
…ues #115) (#273) * refactored but wait_for_* not working * working * merge from main * fix merge conflicts * update docs and tests * update docs * rebase and update docs and tests * update tests * clean up routine * formatting and update kimi_cli to event-driven * fix kiro_cli answer marker pattern * fix(events): address review feedback on the event-driven pipeline Resolve four review comments on PR #273: - InboxService.run() ran deliver_pending() synchronously inside the asyncio consumer loop, blocking it on DB + tmux I/O and starving the StatusMonitor/LogWriter consumers. Offload delivery via asyncio.to_thread, matching the threading discipline documented for the event bus and the existing OpenCode poller. - Event-driven deliveries were started without a PluginRegistry, so they skipped PostSendMessageEvent hooks. Thread the registry through run() -> deliver_pending so status-driven deliveries get the same plugin attribution as the immediate and OpenCode-poller paths. - FifoManager.stop_reader() early-returned when no in-memory reader was tracked, never unlinking the FIFO. Retention cleanup after a restart (empty _readers) would leak *.fifo files unbounded. Always best-effort unlink the FIFO; only log when a reader was actually stopped. - Docs claimed StatusMonitor falls back to a generic shell-prompt pattern before init; _detect_status returns UNKNOWN until a provider registers. Correct the docs to match. Adds regression tests for the registry threading, the to_thread offload, and the stale-FIFO unlink. * fix(events): address Copilot review on PR #273 - flow_service: stop FIFO readers and clear StatusMonitor buffers when recycling an existing flow session, preventing leaked reader threads and stale *.fifo files across repeated flow runs. - flow CLI run: bootstrap the in-process event pipeline (bus loop + StatusMonitor/LogWriter consumers) so execute_flow's async create_terminal/initialize path doesn't hang waiting for status. - inbox_service: deliver pending messages in contiguous same-sender runs so PostSendMessageEvent attribution stays correct when a batch spans multiple senders (num_messages=0). - copilot_cli tests: replace no-op initialize test with real async success and trust-prompt coverage; drop dead placeholder. - docs: correct CODEBASE.md inbox flow (deliver_pending, not the removed check_and_send_pending_messages); document that OutputMode.FULL returns the bounded StatusMonitor buffer, not unbounded scrollback. * fix(status): raw-stream-safe get_status + newest Claude Code TUI support Addresses haofeif's PR #273 review (raw FIFO buffer fed to get_status) plus newest-Claude-Code TUI changes found while reproducing it locally. - claude_code/codex/kimi get_status: strip terminal escapes (utils.text.strip_terminal_escapes) before structural checks, so the cursor-positioning / in-place-redraw sequences in the raw pipe-pane stream no longer break detection. Document the raw-buffer input contract on BaseProvider.get_status. - claude_code: gated support for the newest Claude Code TUI (the box pattern gates it, so older builds keep the legacy logic): live spinner renders ABOVE a box-drawn input prompt; completion shows '* <Verb>ed for Ns' instead of the old marker. Detect both PROCESSING and COMPLETED there. - kiro_cli: clear the StatusMonitor buffer (new StatusMonitor.reset_buffer) on the TUI -> --legacy-ui fallback so the retry is not derived from stale bytes. - input submission: newest Claude Code swallows an Enter sent too soon after a bracketed paste; make the post-paste delay provider-tunable (BaseProvider.paste_submit_delay; Claude overrides to 2.0s). - create_terminal: nudge the shell after pipe-pane attaches so the prompt is captured (fast shells drew it before capture started, timing out wait_for_shell). Full existing unit suite stays green (2012 passed); changes are additive. * fix(claude_code): detect newest Claude Code TUI completion + asterisk spinner in raw stream * fix(claude_code): re-insert spaces for column-positioned text so completion summary is detected The newest Claude Code TUI sometimes redraws the completion summary and spinner with CHA/CUF cursor-positioning escapes instead of literal spaces (e.g. '✻\x1b[3GWorked\x1b[10Gfor\x1b[14G3s'). strip_terminal_escapes dropped those moves with no space, gluing words ('Workedfor3s') so COMPLETION_SUMMARY_PATTERN and the spinner check never matched — get_status stayed IDLE through a finished turn and handoff/poll_until_done timed out. Replace forward horizontal cursor moves with a single space. Verified against three real handoff captures + full suite green. * fix(claude_code): recognize newest-TUI '●' response glyph in message extraction The newest Claude Code TUI renders the response marker as '●' (U+25CF) instead of '⏺' (U+23FA), so extract_last_message_from_script raised 'No Claude Code response found' on a finished handoff even after COMPLETED was detected. Add a line-start- anchored EXTRACTION_RESPONSE_PATTERN matching both glyphs (anchored so the footer effort indicator '… esc to interrupt ● high · /effort' is not mistaken for a marker), and trim the '✻ Worked for Ns' completion stat. RESPONSE_PATTERN used by get_status is left ⏺-only so the legacy COMPLETED check cannot fire mid-stream. * fix(claude_code): boxless completion summary after the last separator wins over a stale spinner When the newest TUI repaints a finished turn boxless ('✻ <Verb>ed for Ns' + ❯) below the previous frame's box separator, a stale spinner remained above that separator. The spinner-before-separator PRIMARY walk fired PROCESSING off the stale spinner and never reached the COMPLETED branch, so a real handoff stayed PROCESSING until timeout. Skip that walk when a completion summary appears after the last separator (the boxless-redraw signature; genuine processing only has the footer there). Verified against four real handoff captures. * fix(claude_code): COMPLETED via ● response marker, robust to clipped/missing completion summary The newest TUI's completion summary is fragile in the raw stream: the duration can be clipped ('✻ Crunched for ' with no Ns) or rendered on a ·/* glyph frame the summary pattern excludes, leaving a finished turn stuck at IDLE. After the PROCESSING checks (which already rule out a LIVE spinner), treat a completion summary OR a start-of-line response marker (⏺/●) plus a visible prompt as COMPLETED — dropping the position-based spinner-freshness guard that mis-counted a stale spinner. Verified against four real handoff captures (greet + multiply). * fix(claude_code): tolerate clipped completion summary ('✻ Crunched for ' no duration) The newest TUI sometimes writes the completion summary's duration via a separate cursor-positioned write that the raw stream splits off, leaving '✻ Crunched for ' with no 'Ns'. The strict COMPLETION_SUMMARY_PATTERN (requires 'for Ns') then failed to recognize completion, so the spinner-before-separator walk reported PROCESSING off a stale spinner and a finished handoff intermittently stuck at PROCESSING/IDLE until timeout. Add GET_STATUS_COMPLETION_PATTERN (glyph + 'for', no ellipsis, no digits required) for get_status detection + the boxless-tail guard; extraction keeps the strict pattern. Verified: six real handoff captures all settle COMPLETED; live-processing and compaction still PROCESSING. * fix(kimi_cli): support the redesigned "Kimi Code" TUI in get_status The newest Kimi CLI (the "Kimi Code" rebuild) replaced the ✨/💫 emoji prompt with a boxed input area ("── input ──"), an "agent (<model> ●)" status bar, a "context: N%" usage footer, and a braille-spinner working indicator ("⠧ Thinking… Ns · N tokens"). The legacy detector keyed on the emoji prompt, so it never observed IDLE and every Kimi terminal timed out at init — this was the failure flagged on #273 (now confirmed to be a Kimi-side TUI redesign, not a regression from the event-driven refactor). get_status now detects the new TUI (gated on the new status/footer markers so legacy emoji builds are unchanged): - READY = status bar / context footer present with no live spinner. - PROCESSING vs READY is decided by POSITION (last braille spinner line vs last ready-chrome line), so stale spinner frames lingering in the 8KB rolling buffer don't pin a finished turn at PROCESSING. - COMPLETED vs IDLE latches on a "•" bullet (a turn produced output); the welcome banner / update nag have none, so a fresh terminal reads IDLE and avoids a premature-completion race when the first task is sent. Adds regression tests driven by real captured raw pipe-pane buffers (idle / processing / completed) of the new TUI. Also enables Claude <-> Kimi cross-provider e2e: - examples/cross-provider/data_analyst_kimi_cli.md (provider: kimi_cli). - TestCrossProviderClaudeToKimi / TestCrossProviderKimiToClaude. - Fix the cross-provider test helper to resolve the worker's provider from its profile (the add-terminal endpoint treats an explicit provider param as authoritative, which silently suppressed the profile override). Verified on real CLIs (kimi 1.47.0 + claude 2.1.x) against a cao-server built from this branch: - TestKimiCliHandoff (2) + TestKimiCliAssign (3): pass. - Claude->Kimi and Kimi->Claude cross-provider assign: pass. Full unit suite green (2202 passed); mypy / black / isort clean. * fix(herdr): make status detection backend-aware so agent init works on herdr The event-driven rewrite made the FIFO -> EventBus -> StatusMonitor pipeline the sole source of shell-readiness and terminal status. The herdr backend deliberately skips that pipeline (its pipe_pane is a no-op; no FIFO reader is started for it, since it delivers via socket events), so for a herdr terminal the StatusMonitor buffer stayed "" and status stayed UNKNOWN forever. As a result provider.initialize() timed out in wait_for_shell()/wait_until_status(), and every status read (API GET /terminals/{id}, busy checks, provider init loops) returned UNKNOWN -- terminal creation was broken for essentially every provider on herdr. Fix at the single chokepoint instead of per-call-site: - status_monitor.get_status() is now backend-aware: for event-inbox backends it derives status on demand from the provider's get_status() (which consults backend.get_native_status()). This fixes all read sites at once -- API status, wait_until_status(), flow busy checks, curator liveness, and the copilot/gemini init loops -- without each having to special-case the backend. - wait_for_shell() reads backend.get_history() directly for event-inbox backends rather than the never-populated StatusMonitor buffer. - wait_until_status() reverts to simply polling the now backend-aware get_status(), removing the duplicated backend logic. The tmux path is unchanged (supports_event_inbox() is False -> same pushed-status read as before). Verified end-to-end against a real herdr 0.6.8 server: a live Claude Code agent launched in a herdr pane, reached idle, and completed a task. Adds test/services/test_status_monitor.py and herdr cases to test/utils/test_terminal.py. * fix(text): normalise CUP-to-column-1 escape so codex idle prompt detects The event-driven status pipeline strips terminal escapes from the raw 8KB FIFO buffer to feed provider.get_status(). _LINE_START_CSI in strip_terminal_escapes already turned CHA (\x1b[1G) and CNL (\x1b[E) into \n so per-line patterns work, but missed CUP — Cursor Position to column 1 (\x1b[<row>;1H). Codex's TUI lays out its bottom prompt + status bar via CUP rather than CHA: \x1b[46;1H› places the idle ``›`` at column 1 of row 46. Without normalising this, the ``›`` glyph stayed glued mid-stream, e.g. ›Improve documentation in @filenameopenai.gpt-5.5 medium · /tmp/... The per-line check at codex.py:get_status:370 only inspects the bottom five lines for ``\s*(?:❯|›|codex>)``, so the prompt was never detected and codex sessions reported PROCESSING forever — every codex e2e test hit "Codex initialization timed out after 60 seconds" (0/12 passing). Extend _LINE_START_CSI to also match \x1b[\d+;1H. After the fix the ›-prefixed idle prompt sits on its own line, the existing IDLE_PROMPT_PATTERN check matches, and codex idle detection works: in replay against captured FIFO logs, 14/15 codex sessions reach idle correctly. Remaining e2e failures are MCP startup >60s and OpenAI ``stream disconnected before completion`` API errors — outside the scope of status detection. Add a regression test covering both \x1b[46;1H› (codex's bottom prompt) and the bare \x1b[1;1Hb form. * fix(kiro_cli): position-aware Initializing/MCP-init check fixes yolo on event-driven pipeline Under tmux capture-pane, a TUI that redraws over its boot screen hides "Initializing..." and the MCP-init line — Kiro's TUI -> --legacy-ui fallback path also calls StatusMonitor.reset_buffer when retrying so the rolling buffer stays clean. Yolo mode does NOT take that fallback path: it forces --legacy-ui directly at launch, so its rolling FIFO buffer keeps the boot bytes forever even after kiro has redrawn over them and the actual interactive prompt is showing below. Until now get_status returned PROCESSING unconditionally whenever TUI_INITIALIZING_PATTERN matched anywhere in the buffer. Yolo sessions therefore reported PROCESSING for the entire session and wait_until_status({IDLE, COMPLETED}) timed out — kiro yolo e2e went from passing to 0/11 under the new event-driven pipeline (#273). Make Check 0 position-aware: only return PROCESSING from a matching TUI_INITIALIZING_PATTERN when no real ``[agent] >`` idle prompt appears AFTER the last init match. The new-TUI placeholder ("Ask a question or describe a task") is intentionally NOT counted as a real idle prompt because the new TUI renders it during boot; that pre-existing test (test_tui_initializing_yields_processing_despite_idle_placeholder, issue #211) still passes. Update test_mcp_server_init_yields_processing — its old fixture asserted boot-line + "[developer] !>" should yield PROCESSING, but real captured Kiro logs show the [developer] prompt only renders AFTER init completes, so the assumption was incorrect. Replace the fixture with the actual placeholder text Kiro shows during boot, and add test_mcp_server_init_with_post_init_prompt_yields_idle to lock in the post-init redrawn-stale case (the yolo failure mode). Verified end-to-end: kiro e2e (yolo, --legacy-ui) 0/11 -> 11/11. * fix(status): sticky ready-status latching to stop 8KB-buffer eviction flap PR #273's event-driven pipeline derives status from a rolling 8KB FIFO buffer fed to per-provider get_status(). TUI redraws (status-bar refreshes, cursor positioning, footer repaints) keep emitting bytes for seconds AFTER the agent settles, eventually evicting the idle/response markers from the 8KB window. Without latching, status flaps rapidly between IDLE/COMPLETED and PROCESSING/UNKNOWN, and both wait_until_status (server-side) and the e2e tests' HTTP polling miss the brief ready windows — causing codex 60s init timeouts, gemini 240s init timeouts, and completion-timeout failures. StatusMonitor now refuses two downgrades once a ready status latches: - ready -> PROCESSING/UNKNOWN (typical buffer-eviction flap) - COMPLETED -> IDLE (codex's user-marker evicts before assistant's, so last_user is None and provider falls back to IDLE, silently overwriting COMPLETED) notify_input_sent() arms a one-shot revert gate so the latch releases when external input legitimately starts a new processing cycle: - terminal_service.send_input / send_special_key (runtime input) - each tested provider's initialize() before its launch keystrokes and bypass/trust prompt acknowledgements codex get_status now also handles the "long response evicted the user marker" case directly: when last_user is None, scan above the TUI footer for an assistant bullet and return COMPLETED instead of IDLE (was returning IDLE forever, which the COMPLETED->IDLE block above would otherwise cement as an off-by-one defence). gemini_cli now declares extraction_tail_lines = 5000 so its existing extraction_retries (3 x 10s) actually fires. The escalating-fetch path (200/500/1000/5000) ran each step once with no inter-step waits, so Ink-TUI redraws still rendering at extraction time fell through to [PARTIAL RESPONSE], leaving the TUI footer in the output and tripping the "? for shortcuts not in output" assertion. E2E results on PR #273 with these changes: codex 10/12 passing (1 pre-existing extraction issue, 1 xfail) claude 12/12 gemini 12/12 kiro 11/11 * fix(status): keep input-arm across ready→ready flaps; make latch state thread-safe Two hardening fixes to the sticky ready-status latch (PR #1): 1. Arm semantics: notify_input_sent()'s one-shot arm was consumed by ANY new ready latch, including a ready→ready downgrade flap (COMPLETED→IDLE when a large paste evicts the response markers, WAITING_USER_ANSWER→IDLE after a permission keystroke). Consuming the arm there blocks the genuine PROCESSING that follows, so the terminal reads ready while the agent is busy — and InboxService delivers on IDLE/COMPLETED, so a queued message could be pasted mid-response. The arm is now consumed only by a PROCESSING transition or a genuine non-ready→ready (init-style) latch. 2. Thread safety: the latch decision is a read-modify-write sequence (read armed → decide transition → consume arm) executed on the asyncio consumer, while notify_input_sent()/get_status()/clear_terminal() run on FastAPI threadpool, inbox delivery workers, and the cleanup thread. Guard StatusMonitor state with a lock (provider regex analysis and bus.publish stay outside it). Also covers the pre-existing Copilot review comments about unsynchronized _buffers/_last_status access. Adds a latching state-machine test suite (12 cases) pinning blocked downgrades, arm consumption, flap survival, and event publication. * fix(claude_code): live MCP tool call no longer misreads as COMPLETED Real failure from the supervisor-handoff e2e: mid-turn, Claude shows an interim completion summary from an earlier thinking phase, then keeps working — '● Calling cao-mcp-server… (ctrl+o to expand)' with a live '✢ Misting… (33s · ↑ 332 tokens)' spinner and a '⎿ Tip: …' hint line between the spinner and the input box. Two gaps combined into a false COMPLETED, which the StatusMonitor ready-latch then pinned until the next input, so the e2e test extracted mid-flight output: - _boxless_completion_tail treated ANY completion summary after the last separator as a finished turn. It now requires that no live spinner (glyph + gerund + ellipsis) renders after the summary — an interim summary followed by a fresh spinner keeps the turn PROCESSING. - The new-TUI box-spinner check looked only at the single line directly above the input box; '⎿ Tip:' hint lines and blanks render between the live spinner and the box, hiding it. The check now walks up to 4 lines, skipping ⎿ continuation lines and blanks. Regression tests reproduce both shapes (fail pre-fix) plus a guard that a summary with no following spinner still reads COMPLETED. * fix(kimi_cli): extract responses from the newest Kimi Code TUI The newest 'Kimi Code' TUI renders user messages as ✨-prefixed prompt lines (no ╭─ input box), responses as • bullets, and a '── input ──' rule plus status bar as footer. extract_last_message_from_script() preferred the last ╰─ box-end as the response anchor — which in the new TUI matches decorative boot banners (Kimi welcome box, FastMCP server banner) ABOVE the conversation — and only stopped at a bare idle prompt, which the new TUI never renders. Result: the 'response' was sliced from the boot screen and ran to end-of-capture (raw spinner frames, status bar), failing the supervisor-assign e2e. - Anchor on the LATEST marker (box-end vs prompt-with-input) instead of box-first: the response always follows the last user input, whichever marker style rendered it. - Stop the response at the new-TUI footer: the '── input ──' rule or the status-bar/context-footer line, in addition to the bare idle prompt. Fixtures are ground truth from a live Kimi Code session. * fix(events): address remaining Copilot review nits on PR #273 - LogWriter: open log files with explicit encoding='utf-8', errors='replace' — a non-UTF-8 platform locale (POSIX/C) would raise UnicodeEncodeError on the first unencodable chunk and stop log persistence for that terminal. - EventBus.set_loop: accept Optional[loop] (test fixtures already call set_loop(None)); publish() now guards against the loop being closed during shutdown instead of raising RuntimeError from a FIFO reader thread draining its last chunks. - BaseProvider.get_status docstring: soften the strip_terminal_escapes 'should' to MAY with an explicit note that kiro_cli intentionally preserves raw \r for permission-prompt detection and must not be 'fixed' to comply. The 'except Exception swallows CancelledError' comments are intentionally NOT addressed: since Python 3.8, asyncio.CancelledError derives from BaseException, so those loops do not swallow cancellation. * test(e2e): read kimi skill catalog from the agent-file system.md Kimi (v1.20+) launches via 'cd <temp_dir> && kimi --agent-file <temp_dir>/agent.yaml', writing the system prompt — including the injected skill catalog — to <temp_dir>/system.md. The catalog is no longer in the CLI command string, and the full-screen Kimi Code TUI clears the visible screen on startup, so asserting against raw tmux scrollback always failed. Parse the temp dir from the launch command (capture-pane -J so soft-wrapped lines can't split the path mid-token) and assert against system.md on disk — same pattern as the existing gemini GEMINI.md branch. * fix(kimi_cli): honest turn-in-flight detection for the newest Kimi Code TUI The newest TUI renders the live spinner BETWEEN the '── input ──' rule and the status bar, and repaints the status bar with every frame — so the spinner-vs-ready-chrome position compare read 'ready' mid-turn whenever a full footer repaint was the freshest chunk (measured: one supervisor turn flapped completed↔processing 29 times in a 57KB stream). The StatusMonitor ready-latch then pinned the first false COMPLETED ~130ms after dispatch, so supervisor flows extracted mid-flight output (supervisor-assign e2e). Detection now uses signals validated by replaying captured live streams: - Live-spinner lines: braille/moon glyphs (incl. tool-call lines like '⠹ Using handoff({…})'), EXCLUDING boot chrome ('⠧ MCP Servers: 0/1', '⠋ Loading configuration...') which legitimately shows braille while idle at the welcome screen. - In flight when a live spinner is within the freshest 15 lines OR renders after the last '•' bullet (covers chunk boundaries where streamed thinking text temporarily pushed the spinner out of the tail). - Dispatch grace: 5s after send_input() (mark_input_received now stamps the dispatch), bridging the paste→first-spinner-frame gap. - Rendered-pane confirmation: a ready-looking chunk boundary mid-turn is byte-identical to one at real completion (stale spinner ~21 lines back, bullets 2-3 from the end in BOTH), so when the stream says ready post-dispatch, confirm against the rendered pane — tmux's compositor has resolved every in-place redraw, so a visible spinner is live, not stale. Also raises the e2e SUPERVISOR_COMPLETION_TIMEOUT 300→600s: a kimi worker alone takes ~3m20s on the report-template handoff; 300s only looked sufficient while the false early COMPLETED was masking the real duration. E2E (live agents): kimi supervisor handoff + assign_and_handoff now pass; kimi handoff x2, send_message, skills unaffected and green. * fix(tools): close claude_code tool-restriction escapes via Task/Monitor/NotebookEdit The allowed-tools e2e caught restricted agents escaping their restrictions live: a reviewer (no Bash/Write) created the forbidden file anyway — first 'via a delegated subagent that ran the write through a shell command' (Task), then on retry via 'the Monitor tool' (background shell scripts). CAO computes --disallowedTools as (tool universe - allowed natives), and claude_code's universe only contained Bash/Read/Edit/Write/Glob/Grep — so the execution-capable tools outside it were never disallowed. Gate them by privilege equivalence: - execute_bash now maps Bash, BashOutput, KillShell, Task (subagents run with their own full toolset), and Monitor (runs arbitrary shell scripts). Anything these can do, Bash can too, so profiles allowed execute_bash lose nothing. - fs_write now maps NotebookEdit alongside Edit/Write (.ipynb writes). Known limitation: this is enumeration against an evolving toolset — each new execution-capable Claude Code tool needs adding here. A deny-by-default mechanism upstream would be the durable fix. E2E (live, herdr backend): all 4 TestClaudeCodeAllowedTools pass, including the two that previously demonstrated the escape. --------- Co-authored-by: Tuan Nguyen <tuankn@amazon.com> Co-authored-by: haofeif <56006724+haofeif@users.noreply.github.com> Co-authored-by: Tuan Nguyen <32879640+tuanknguyen@users.noreply.github.com> Co-authored-by: Feng, Haofei <haofeif@amazon.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue #, if available:
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.