fix(cli): clear overlay states when interrupting a running agent#14026
fix(cli): clear overlay states when interrupting a running agent#14026neo-2026 wants to merge 4 commits into
Conversation
When the user types a new message while the agent is running, the interrupt fires (agent.interrupt()) but the active overlay states (_approval_state, _clarify_state, _sudo_state, _secret_state) are NOT cleared. Any of these being non-None after the interrupt leaves input locked: - _approval_state / _clarify_state / _sudo_state: the prompt_toolkit read_only=Condition(...) gates the input field - _clarify_state: the keypress filter passes through only Ctrl+C when blocked, silently swallowing all other keystrokes The bug was introduced/exposed by 52a79d9 which changed the approval callback to be set inside the agent thread. When an approval prompt is active and the user interrupts, the callback gets cleared by the thread's finally block but _approval_state is left pointing at a stale dict with a dead response_queue. The CLI appears completely frozen — no keystrokes register, Ctrl+C does nothing visible. Fix: when firing an interrupt, explicitly drain and clear all active overlay states. Each state's response_queue is sent a terminal value (deny for approval, None for others) so any thread blocked waiting on the queue unblocks cleanly rather than hanging forever. Reproducer: run a command that triggers an approval prompt (e.g. a dangerous terminal command), then type a new message — the terminal freezes until the 60s approval timeout expires. Closes or relates to NousResearch#13618
mxnstrexgl
left a comment
There was a problem hiding this comment.
🤖 Automated PR Review
Security Scan
- ✓ No hardcoded secrets in diff
- ✓ No injection / shell-risk patterns in diff
Code Quality
- ⚠ Blocking: interrupted sudo prompts leave
_modal_input_snapshotbehind, so the pre-modal draft is not restored and later modals can inherit stale snapshot state - 💡 Please add a regression test for interrupting an active overlay (sudo/approval/clarify) so this path stays covered
Summary
Status: NEEDS CHANGES (1 blocking correctness issue, 1 test gap)
Good direction. Overlay cleanup for interrupt is right fix area. But sudo path still leaks modal snapshot state.
| self._sudo_state["response_queue"].put(None) | ||
| except Exception: | ||
| pass | ||
| self._sudo_state = None |
There was a problem hiding this comment.
Blocking: _sudo_password_callback() captures _modal_input_snapshot and normally restores it on every exit path (cli.py:7907, cli.py:7918-7935). This interrupt cleanup clears _sudo_state, but never restores or clears that snapshot. After interrupting a sudo prompt, the pre-modal draft stays stranded and later modals stop snapshotting because _capture_modal_input_snapshot() returns early once _modal_input_snapshot is already set. Please call _restore_modal_input_snapshot() here and reset _sudo_deadline when aborting sudo.
🤖 Automated PR ReviewSecurity Scan
Code Quality
SummaryStatus: NEEDS CHANGES (1 blocking correctness issue, 1 test gap) |
When the chat is stuck (stale _approval_state / _clarify_state / etc. left by a previous interrupted turn), pressing Ctrl+C hit the overlay cancel branch and returned immediately — the agent-interrupt branch at the bottom was never reached. A second Ctrl+C would then fall through to the exit path (since _agent_running is False), killing Hermes entirely instead of unfreezing the session. Fix: remove the early returns from each overlay cancel branch. Instead, accumulate a _overlay_cleared flag and fall through to the agent-interrupt check. Add a guard so we only skip the exit path if overlays were cleared but the agent is idle (the normal non-frozen case where Ctrl+C should just dismiss the prompt, not interrupt or exit).
…nterrupt When an interrupt fires while a sudo prompt is active, the previous code cleared _sudo_state but left _modal_input_snapshot pointing at the captured pre-modal draft and left _sudo_deadline non-zero. This caused two bugs: 1. _capture_modal_input_snapshot() returns early if _modal_input_snapshot is not None, so subsequent modals silently skip snapshotting and can never restore the user's draft. 2. _sudo_deadline remaining non-zero leaves cruft that can confuse the sudo UI on the next invocation. Fix: call _restore_modal_input_snapshot() and set _sudo_deadline = 0 in both interrupt paths: - the agent-running interrupt handler (~line 8500) - the Ctrl+C key binding handler (~line 9480) Also adds three regression tests in TestOverlayInterruptCleanup: - test_interrupt_clears_sudo_state_and_restores_snapshot - test_ctrlc_clears_sudo_state_and_restores_snapshot - test_interrupt_unblocks_blocked_thread Addresses blocking feedback on NousResearch#14026.
…l/clarify/sudo) Adds TestInterruptOverlayClearance covering: - test_interrupt_clears_approval_state_and_sends_deny - test_interrupt_unblocks_approval_callback_thread - test_interrupt_clears_clarify_state - test_interrupt_unblocks_clarify_callback_thread Sudo interrupt was already covered by TestOverlayInterruptCleanup. Addresses reviewer request in PR NousResearch#14026.
Problem
When the user types a new message while the agent is running, the interrupt fires (
agent.interrupt()) but the active overlay states (_approval_state,_clarify_state,_sudo_state,_secret_state) are not cleared. Any of these being non-None after the interrupt leaves input locked:read_only=Condition(...)on the prompt_toolkit input field gates on_command_running, and the hint bar shows "command in progress · input temporarily disabled"$isBlockedis true — all other keystrokes are silently droppedThe result: the CLI appears completely frozen after interrupting. No keystrokes register. The user has to kill and restart Hermes.
Root Cause
The bug was exposed/introduced by 52a79d9 which set the approval callback inside the agent thread. When an approval prompt is active and the user fires an interrupt, the thread's
finallyblock clears the callback — but_approval_stateremains pointing at a stale dict with a deadresponse_queue. The overlay stays "active" even though nothing is servicing it.Fix
When firing an interrupt, explicitly drain and nil out all active overlay states. Each state's
response_queueis sent a terminal value ("deny"for approval,Nonefor others) so any thread blocked on.get()unblocks cleanly.Reproducer
Relates to / closes #13618