fix(cli): graceful fallback when no Win32 console buffer (headless services) by zw11591-sketch · Pull Request #25340 · NousResearch/hermes-agent

zw11591-sketch · 2026-05-14T01:38:32Z

Summary

Make cli.py survive on Windows when there is no Win32 console screen buffer attached to the process — the situation every hermes child of a Windows service / scheduled task / DETACHED_PROCESS parent finds itself in. Previously these processes died at the very first colored line, before AIAgent was even constructed.

The fix is a small wrapper around _pt_print(_PT_ANSI(text)) (_safe_pt_print()) that catches prompt_toolkit.output.win32.NoConsoleScreenBufferError and falls back to plain sys.stdout.write. Interactive-terminal behavior is byte-identical to before.

Repro (before this PR)

In any context that has no Win32 console buffer (e.g. an NSSM-wrapped Windows service spawning a child, a Task Scheduler task with no UI session, or subprocess.Popen(..., creationflags=DETACHED_PROCESS|CREATE_NO_WINDOW)):

hermes -p ops chat -q "hi"

…immediately exits with:

File "...\hermes-agent\cli.py", line 1528, in _cprint
    _pt_print(_PT_ANSI(text))
File "...\prompt_toolkit\shortcuts\utils.py", line 111, in print_formatted_text
    output = get_app_session().output
File "...\prompt_toolkit\application\current.py", line 67, in output
    self._output = create_output()
File "...\prompt_toolkit\output\defaults.py", line 91, in create_output
    return Win32Output(stdout, default_color_depth=color_depth_from_env)
File "...\prompt_toolkit\output\win32.py", line 219, in get_win32_screen_buffer_info
    raise NoConsoleScreenBufferError
prompt_toolkit.output.win32.NoConsoleScreenBufferError: No Windows console found. Are you running cmd.exe?

The crash happens at the first _cprint() call inside HermesCLI.chat (_cprint(f"{_DIM}Initializing agent...{_RST}")), so:

Initializing agent... is never even printed.
AIAgent.__init__ never runs.
No agent log lines past plugin discovery + .env load.

Real-world impact

This is what motivated the fix. The kanban dispatcher (gateway/run.py -> plugins/kanban/...) launches each worker as hermes -p <profile> chat -q "work kanban task <id>". When the gateway runs under NSSM as a Windows service:

Dispatcher claims the task, spawns the child.
Child crashes inside ~1s on _cprint. PID is gone before any agent activity.
Dispatcher logs crashed=1 (only sees pid X not alive, no traceback in dispatcher log because the traceback went to the child's vanished stdout).
After max_retries, the task is auto-blocked.
Every kanban task on the board ends up blocked regardless of profile.

End-to-end verification of the fix:

A task that had been auto-blocked with Agent crashed 2x: pid not alive was unblocked.
On the next dispatcher tick the same hermes -p ops chat -q "..." invocation succeeded.
ops worker ran 12 LLM calls and 8 tool calls (including a real docker build + docker run) and called kanban_complete. Task transitioned to done.

What this PR does

cli.py only:

Add a _safe_pt_print(text) helper near the top of the file. It calls _pt_print(_PT_ANSI(text)) inside try/except:
- On NoConsoleScreenBufferError -> sys.stdout.write(text + "\n"); sys.stdout.flush().
- On any other Exception from prompt_toolkit / IO -> same fallback. A failed colored print must never kill the agent.
Wrap the NoConsoleScreenBufferError import in try/except with a stub fallback class, so non-Windows installs (where prompt_toolkit.output.win32 may not be importable) still load cli.py cleanly.
Replace the 7 raw _pt_print(_PT_ANSI(text)) call sites inside _cprint() with _safe_pt_print(text). These are the sites that fire on every line printed via _cprint — banner, spinner frames, tool activity prefixes, streamed reasoning, etc.

The single remaining _pt_print(_PT_ANSI(str(line))) call site outside _cprint (the body of _emit_lines_to_pt, which is only reached after a prompt_toolkit Application is already running) is not changed in this PR — the issue does not surface there in practice and I'd rather keep the diff focused on the actually-broken path. Happy to extend if reviewers prefer.

Diff stats: cli.py | 58 ++++++++++++++++++++++++++++++++++++++++------- (51 insertions, 7 modifications, 0 deletions of pre-existing logic).

Why a wrapper instead of patching prompt_toolkit / forcing a fake console

Wrapping _pt_print is the smallest possible change that fixes the crash without touching prompt_toolkit internals or assuming anything about the runtime.
Fake-console approaches (winpty, pywinpty, AllocConsole) would have to be applied in every caller that ever spawns hermes as a child — gateway, dispatcher, NSSM wrappers, user scripts. The wrapper localizes the fix to cli.py.
The wrapper is a no-op on the happy path: the inner call is the original _pt_print(_PT_ANSI(text)). Attached terminals (interactive cmd.exe, Windows Terminal, mintty/MSYS, conhost) hit the inner call and exit the try immediately — no measurable overhead.

Risks

If the interactive path ever raised a non-NoConsoleScreenBufferError exception that the surrounding _cprint code relied on (e.g. via the except Exception branches at L1556, L1612), the new wrapper will swallow it and fall back to sys.stdout.write instead of letting the existing fallback _pt_print(_PT_ANSI(text)) run. In practice every existing _cprint fallback path already ends in _pt_print(_PT_ANSI(text)) — the wrapper just makes those last-resort prints themselves crash-safe. I think this is strictly better but it's the one behavior change worth a second pair of eyes.
The wrapper imports sys under an alias (_sys_for_pt_fallback) to make absolutely sure no later import sys as ... shadow can break the fallback. Aesthetically ugly; happy to drop the alias if reviewers prefer.

Testing

Manual:

Repro: subprocess.Popen([hermes_exe, "-p", "ops", "chat", "-q", "..."], creationflags=DETACHED_PROCESS|CREATE_NO_WINDOW, stdin=DEVNULL, stdout=PIPE, stderr=PIPE) — succeeds, Initializing agent... and the full agent response appear in stdout, exit code 0. Without the patch the same call exits non-zero with the traceback above.
Real workload: kanban dispatcher under NSSM service runs a docker build task end-to-end via the ops profile.
Interactive: hermes in Windows Terminal, mintty (Git Bash), and cmd.exe — banner/colors render identically to main.

I haven't added a unit test because exercising NoConsoleScreenBufferError reliably needs a non-trivial subprocess harness (the import itself works fine inside pytest — only create_output() blows up, and only when sys.stdout lacks a real console handle). Happy to add one in tests/ if reviewers point me at the right shape.

Environment

Windows 11
Python 3.12
prompt_toolkit whatever ships with current requirements
hermes-agent rebased on dd5a9502e (current main at PR open)

…rvices) When hermes runs as a child of a Windows service (NSSM, scheduled task, or any LocalSystem context), the child process inherits no Win32 console screen buffer. prompt_toolkit's print_formatted_text() lazily calls create_output() -> Win32Output.__init__ -> get_win32_screen_buffer_info() which raises NoConsoleScreenBufferError. The exception was uncaught at every _pt_print(_PT_ANSI(text)) call site inside _cprint(), so the very first colored line ("Initializing agent..." at the top of HermesCLI.chat) killed the process before AIAgent could even be constructed. This blocked the kanban dispatcher running under a Windows service: every spawned child agent died at startup, the dispatcher reclaimed the lock, retried, hit max_retries, and auto-blocked the task. Repro: # In a service / no-console context (DETACHED_PROCESS|CREATE_NO_WINDOW): hermes -p <profile> chat -q "hi" # -> prompt_toolkit.output.win32.NoConsoleScreenBufferError # -> exit before agent loop runs Fix: introduce _safe_pt_print() that wraps _pt_print(_PT_ANSI(text)) in a try/except. On NoConsoleScreenBufferError (or any other prompt_toolkit runtime failure) it falls back to plain sys.stdout.write + flush. Replace the 7 raw _pt_print(_PT_ANSI(text)) call sites inside _cprint() with _safe_pt_print(text). Behavior in normal interactive terminals is unchanged - the wrapped path is identical to the prior call. The NoConsoleScreenBufferError import is itself wrapped in try/except so non-Windows installs (where prompt_toolkit.output.win32 doesn't exist as a usable module) still import cli.py cleanly. Verified end-to-end: blocked kanban task t_3231a03f went from 'crashed 2x: pid not alive' to 'done' in one dispatcher cycle, with the spawned ops profile completing 12 LLM calls and 8 tool calls including a successful docker build.

…ows-service

alt-glitch · 2026-05-14T01:59:30Z

Likely duplicate of #22482 (same fix for #22445 — Win32 NoConsoleScreenBufferError fallback). Also overlaps with #22928 (--no-gui flag approach). #23011 was already closed as dup of #22482.

Resolve cli.py conflicts in print_above_input_no_redraw(): - Keep _safe_pt_print() wrapper (handles NoConsoleScreenBufferError for headless Windows services without Win32 console buffer). - Preserve main's ensure_future fix for run_in_terminal coroutine (NousResearch#23185 Bug A) by wrapping _safe_pt_print inside the run_in_terminal lambda instead of _pt_print(_PT_ANSI(...)). Both fixes compose cleanly: headless services still degrade to bare print(), and cross-thread emissions no longer drop output silently. Verified: tests/cli/test_cprint_bg_thread.py (13/13), test_cli_init.py + test_cli_force_redraw.py (47/47).

…github.com/zw11591-sketch/hermes-agent into fix/cli-noconsole-headless-windows-service

teknium1 · 2026-06-12T06:19:13Z

Automated hermes-sweeper review found this is implemented on current main.

Evidence:

cli.py:2224 now wraps the direct _pt_print(_PT_ANSI(text)) path in _cprint() with try/except and falls back to plain print(text) when prompt_toolkit output setup fails. The inline comment explicitly covers redirected subprocess worker logging and NoConsoleScreenBufferError / OSError.
cli.py:9847 still sends the reported startup line, Initializing agent..., through _cprint(), so the current fallback covers the crash that happened before AIAgent construction.
hermes_cli/kanban_db.py:6828 still matches the kanban worker scenario from the PR: workers are spawned with stdout=log_f, stderr=subprocess.STDOUT, and CREATE_NO_WINDOW on Windows.
The overlapping fix merged via fix(windows): handle redirected stdout in _cprint fallback (#28083) #28330 as 8bcb6082acf83db8e199f1065fa065340d82de0d (fix(windows): handle redirected stdout in _cprint fallback) and is included in v2026.5.28 and later.

Thanks for the detailed repro and verification notes. The prior duplicate discussion in this PR (#22482 / #22445 / #22928 / #23011) was useful in confirming this is the same bug class now covered on main.

zw11591-sketch and others added 2 commits May 14, 2026 09:37

Merge branch 'NousResearch:main' into fix/cli-noconsole-headless-wind…

fe8bd12

…ows-service

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/cli CLI entry point, hermes_cli/, setup wizard labels May 14, 2026

This was referenced May 14, 2026

fix(cli): gracefully handle Windows console failures in _cprint (kanban fix) #25500

Closed

fix(cli): handle NoConsoleScreenBufferError in _cprint on Windows #25556

Closed

fix(windows): handle redirected stdout in _cprint fallback #28083

Closed

zw11591-sketch added 2 commits May 20, 2026 17:57

Merge branch 'fix/cli-noconsole-headless-windows-service' of https://…

184e950

…github.com/zw11591-sketch/hermes-agent into fix/cli-noconsole-headless-windows-service

teknium1 closed this Jun 12, 2026

teknium1 added the sweeper:implemented-on-main Sweeper: behavior already present on current main label Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cli): graceful fallback when no Win32 console buffer (headless services)#25340

fix(cli): graceful fallback when no Win32 console buffer (headless services)#25340
zw11591-sketch wants to merge 4 commits into
NousResearch:mainfrom
zw11591-sketch:fix/cli-noconsole-headless-windows-service

zw11591-sketch commented May 14, 2026

Uh oh!

alt-glitch commented May 14, 2026

Uh oh!

teknium1 commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zw11591-sketch commented May 14, 2026

Summary

Repro (before this PR)

Real-world impact

What this PR does

Why a wrapper instead of patching prompt_toolkit / forcing a fake console

Risks

Testing

Environment

Uh oh!

alt-glitch commented May 14, 2026

Uh oh!

teknium1 commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants