fix(cli): graceful fallback when no Win32 console buffer (headless services)#25340
Closed
zw11591-sketch wants to merge 4 commits into
Closed
fix(cli): graceful fallback when no Win32 console buffer (headless services)#25340zw11591-sketch wants to merge 4 commits into
zw11591-sketch wants to merge 4 commits into
Conversation
…rvices)
When hermes runs as a child of a Windows service (NSSM, scheduled task,
or any LocalSystem context), the child process inherits no Win32 console
screen buffer. prompt_toolkit's print_formatted_text() lazily calls
create_output() -> Win32Output.__init__ -> get_win32_screen_buffer_info()
which raises NoConsoleScreenBufferError. The exception was uncaught at
every _pt_print(_PT_ANSI(text)) call site inside _cprint(), so the very
first colored line ("Initializing agent..." at the top of HermesCLI.chat)
killed the process before AIAgent could even be constructed.
This blocked the kanban dispatcher running under a Windows service: every
spawned child agent died at startup, the dispatcher reclaimed the lock,
retried, hit max_retries, and auto-blocked the task. Repro:
# In a service / no-console context (DETACHED_PROCESS|CREATE_NO_WINDOW):
hermes -p <profile> chat -q "hi"
# -> prompt_toolkit.output.win32.NoConsoleScreenBufferError
# -> exit before agent loop runs
Fix: introduce _safe_pt_print() that wraps _pt_print(_PT_ANSI(text)) in
a try/except. On NoConsoleScreenBufferError (or any other prompt_toolkit
runtime failure) it falls back to plain sys.stdout.write + flush. Replace
the 7 raw _pt_print(_PT_ANSI(text)) call sites inside _cprint() with
_safe_pt_print(text). Behavior in normal interactive terminals is
unchanged - the wrapped path is identical to the prior call.
The NoConsoleScreenBufferError import is itself wrapped in try/except so
non-Windows installs (where prompt_toolkit.output.win32 doesn't exist as
a usable module) still import cli.py cleanly.
Verified end-to-end: blocked kanban task t_3231a03f went from 'crashed
2x: pid not alive' to 'done' in one dispatcher cycle, with the spawned
ops profile completing 12 LLM calls and 8 tool calls including a
successful docker build.
Collaborator
This was referenced May 14, 2026
Resolve cli.py conflicts in print_above_input_no_redraw(): - Keep _safe_pt_print() wrapper (handles NoConsoleScreenBufferError for headless Windows services without Win32 console buffer). - Preserve main's ensure_future fix for run_in_terminal coroutine (NousResearch#23185 Bug A) by wrapping _safe_pt_print inside the run_in_terminal lambda instead of _pt_print(_PT_ANSI(...)). Both fixes compose cleanly: headless services still degrade to bare print(), and cross-thread emissions no longer drop output silently. Verified: tests/cli/test_cprint_bg_thread.py (13/13), test_cli_init.py + test_cli_force_redraw.py (47/47).
…github.com/zw11591-sketch/hermes-agent into fix/cli-noconsole-headless-windows-service
Contributor
|
Automated hermes-sweeper review found this is implemented on current main. Evidence:
Thanks for the detailed repro and verification notes. The prior duplicate discussion in this PR (#22482 / #22445 / #22928 / #23011) was useful in confirming this is the same bug class now covered on main. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Make
cli.pysurvive on Windows when there is no Win32 console screen buffer attached to the process — the situation everyhermeschild of a Windows service / scheduled task /DETACHED_PROCESSparent finds itself in. Previously these processes died at the very first colored line, beforeAIAgentwas even constructed.The fix is a small wrapper around
_pt_print(_PT_ANSI(text))(_safe_pt_print()) that catchesprompt_toolkit.output.win32.NoConsoleScreenBufferErrorand falls back to plainsys.stdout.write. Interactive-terminal behavior is byte-identical to before.Repro (before this PR)
In any context that has no Win32 console buffer (e.g. an NSSM-wrapped Windows service spawning a child, a Task Scheduler task with no UI session, or
subprocess.Popen(..., creationflags=DETACHED_PROCESS|CREATE_NO_WINDOW)):…immediately exits with:
The crash happens at the first
_cprint()call insideHermesCLI.chat(_cprint(f"{_DIM}Initializing agent...{_RST}")), so:Initializing agent...is never even printed.AIAgent.__init__never runs..envload.Real-world impact
This is what motivated the fix. The kanban dispatcher (
gateway/run.py->plugins/kanban/...) launches each worker ashermes -p <profile> chat -q "work kanban task <id>". When the gateway runs under NSSM as a Windows service:_cprint. PID is gone before any agent activity.crashed=1(only seespid X not alive, no traceback in dispatcher log because the traceback went to the child's vanished stdout).max_retries, the task is auto-blocked.blockedregardless of profile.End-to-end verification of the fix:
Agent crashed 2x: pid not alivewas unblocked.hermes -p ops chat -q "..."invocation succeeded.docker build+docker run) and calledkanban_complete. Task transitioned todone.What this PR does
cli.pyonly:_safe_pt_print(text)helper near the top of the file. It calls_pt_print(_PT_ANSI(text))insidetry/except:NoConsoleScreenBufferError->sys.stdout.write(text + "\n"); sys.stdout.flush().Exceptionfrom prompt_toolkit / IO -> same fallback. A failed colored print must never kill the agent.NoConsoleScreenBufferErrorimport intry/exceptwith a stub fallback class, so non-Windows installs (whereprompt_toolkit.output.win32may not be importable) still loadcli.pycleanly._pt_print(_PT_ANSI(text))call sites inside_cprint()with_safe_pt_print(text). These are the sites that fire on every line printed via_cprint— banner, spinner frames, tool activity prefixes, streamed reasoning, etc.The single remaining
_pt_print(_PT_ANSI(str(line)))call site outside_cprint(the body of_emit_lines_to_pt, which is only reached after a prompt_toolkitApplicationis already running) is not changed in this PR — the issue does not surface there in practice and I'd rather keep the diff focused on the actually-broken path. Happy to extend if reviewers prefer.Diff stats:
cli.py | 58 ++++++++++++++++++++++++++++++++++++++++-------(51 insertions, 7 modifications, 0 deletions of pre-existing logic).Why a wrapper instead of patching prompt_toolkit / forcing a fake console
_pt_printis the smallest possible change that fixes the crash without touchingprompt_toolkitinternals or assuming anything about the runtime.winpty,pywinpty,AllocConsole) would have to be applied in every caller that ever spawnshermesas a child — gateway, dispatcher, NSSM wrappers, user scripts. The wrapper localizes the fix tocli.py._pt_print(_PT_ANSI(text)). Attached terminals (interactivecmd.exe, Windows Terminal, mintty/MSYS, conhost) hit the inner call and exit thetryimmediately — no measurable overhead.Risks
NoConsoleScreenBufferErrorexception that the surrounding_cprintcode relied on (e.g. via theexcept Exceptionbranches at L1556, L1612), the new wrapper will swallow it and fall back tosys.stdout.writeinstead of letting the existing fallback_pt_print(_PT_ANSI(text))run. In practice every existing_cprintfallback path already ends in_pt_print(_PT_ANSI(text))— the wrapper just makes those last-resort prints themselves crash-safe. I think this is strictly better but it's the one behavior change worth a second pair of eyes.sysunder an alias (_sys_for_pt_fallback) to make absolutely sure no laterimport sys as ...shadow can break the fallback. Aesthetically ugly; happy to drop the alias if reviewers prefer.Testing
Manual:
subprocess.Popen([hermes_exe, "-p", "ops", "chat", "-q", "..."], creationflags=DETACHED_PROCESS|CREATE_NO_WINDOW, stdin=DEVNULL, stdout=PIPE, stderr=PIPE)— succeeds,Initializing agent...and the full agent response appear in stdout, exit code 0. Without the patch the same call exits non-zero with the traceback above.docker buildtask end-to-end via the ops profile.hermesin Windows Terminal, mintty (Git Bash), and cmd.exe — banner/colors render identically tomain.I haven't added a unit test because exercising
NoConsoleScreenBufferErrorreliably needs a non-trivial subprocess harness (the import itself works fine inside pytest — onlycreate_output()blows up, and only whensys.stdoutlacks a real console handle). Happy to add one intests/if reviewers point me at the right shape.Environment
prompt_toolkitwhatever ships with currentrequirementshermes-agentrebased ondd5a9502e(currentmainat PR open)