fix: display structured agent result failures by counterposition · Pull Request #20873 · NousResearch/hermes-agent

counterposition · 2026-05-06T19:00:26Z

Why this PR is worth reviewer attention

Today, a provider/runtime failure can look like Hermes simply produced no answer. The concrete failure mode that motivated this: an invalid model slug caused AIAgent.run_conversation() to return final_response: None, but the TUI rendered that as a blank assistant turn. The useful error was only discoverable by digging through logs/request dumps.

This PR makes those failures visible and actionable across the main user surfaces. It gives Hermes a structured “the agent failed/incompleted before answering” result path, renders that as runtime/system UI instead of assistant prose, and redacts sensitive text before display.

The value is simple: when the model/provider rejects a request, users should see the reason immediately instead of debugging a blank response.

What changed

Added agent.run_result_display helpers to normalize run_conversation() result envelopes into:
- status: complete, partial, error, or interrupted
- visible display text
- display-safe error metadata
- redacted runtime error text
Converted key no-output runtime/provider branches in run_agent.py to structured failure/partial envelopes with final_response: None, display_error, error_kind, status code, and provider/model/base URL metadata.
Wired the classic CLI, TUI gateway, and TUI frontend to render these failures visibly without pretending they were assistant-authored messages.
Added regression coverage for helper behavior, source envelopes, redaction, TUI gateway payloads, frontend rendering, and CLI display behavior.

Behavior before / after

Before:

Provider rejects request, e.g. invalid model ID.
Agent returns final_response: None.
TUI/CLI paths may show a blank/no-op response.
User has to inspect logs to discover the real error.

After:

Provider/runtime failure remains semantically distinct from assistant output.
Result carries display_error + structured metadata.
TUI renders a runtime/system error row, not assistant prose.
CLI shows the failure text at the presentation boundary.
Sensitive text is redacted both when envelopes are produced and when legacy/raw envelopes are displayed.

Review focus

The main design choice is intentional: this does not stuff runtime failures into final_response. The model did not answer. Instead, final_response stays None for no-output failures, and UI boundaries use display_error/status metadata to show the failure.

Most relevant files:

agent/run_result_display.py
run_agent.py
tui_gateway/server.py
ui-tui/src/app/turnController.ts
cli.py

Verification

Latest local verification after addressing the adversarial review finding about _make_failure_result(...) metadata argument types:

uv run ty check run_agent.py --output-format concise still reports many baseline ty issues in this file, but no remaining diagnostics mention _make_failure_result or the reported call sites after the helper now accepts object-like provider/model/base URL values and centralizes coercion/redaction.
HERMES_TEST_NO_SYNC=1 scripts/run_tests.sh tests/agent/test_run_result_display.py tests/run_agent/test_run_agent.py::TestExecuteToolCalls::test_failure_envelope_redacts_sensitive_error_text tests/run_agent/test_run_agent.py::TestExecuteToolCalls::test_non_retryable_client_error_returns_structured_failure_envelope -q → 15 passed
python3 -m py_compile agent/run_result_display.py run_agent.py && git diff --check → passed

Earlier broader verification on this branch:

HERMES_TEST_NO_SYNC=1 scripts/run_tests.sh tests/agent/test_run_result_display.py tests/run_agent/test_run_agent.py tests/cli/test_cli_init.py tests/cli/test_cli_background_tui_refresh.py tests/test_tui_gateway_server.py::test_prompt_submit_failed_empty_result_emits_error_text tests/test_tui_gateway_server.py::test_prompt_submit_partial_empty_result_emits_partial_text -q → 368 passed
npm --prefix ui-tui test -- src/__tests__/createGatewayEventHandler.test.ts → 42 passed
npm --prefix ui-tui run type-check → passed
npm --prefix ui-tui run build → passed
Added-line security scan: no hardcoded secret assignment, shell injection, dangerous eval/exec, unsafe pickle, or SQL string-formatting findings
Independent pre-commit review: safe to commit; no blocking issues found

CI note

The red ruff + ty diff check appears to be a workflow/base issue, not a blocking code-lint failure in this PR:

The failing step is Post / update PR comment, which exits with GitHub API 403: Resource not accessible by integration when trying to write the lint summary comment from this forked PR.
The lint summary itself reports ruff as clean on both HEAD and base.
The workflow’s own summary text says ty diagnostics are surfaced as warnings and “this check never fails the build.”

So reviewers should treat the red Lint check as a CI/comment-permission problem unless the workflow is changed or re-run in a context with comment-write permissions.

@counterposition

…s empty (#21245) When the provider rejects a request (e.g. invalid model slug like '--provider nous --model kimi-k2.6' where the valid slug is 'moonshotai/kimi-k2.6'), run_conversation() returns {failed: True, error: <detail>, final_response: None}. The TUI gateway and one-shot CLI mode both dropped the error on the floor and emitted an empty turn, so the user saw a blank response with no indication that anything went wrong. Mirror the interactive CLI's existing pattern (cli.py:9832): when final_response is empty AND (failed|partial) is set AND error is populated, surface 'Error: <detail>' as the visible text. Leaves the None-with-no-error path and the '(empty)' sentinel path untouched — an empty successful turn still renders empty, and existing sentinel handlers keep owning their lane. Reported by @counterposition in PR #20873; taking a minimal fix rather than the broader structured-failure refactor proposed there.

teknium1 · 2026-05-07T12:53:38Z

Thanks for the clear bug report and the repro — that was what made this easy to act on.

We went with a narrower fix in #21245 (merged as 6e46f99) rather than the structured-failure refactor here. Two reasons:

The reported symptom (blank response on --provider nous --model kimi-k2.6) is specifically a TUI + one-shot CLI issue — the interactive CLI at cli.py:9832 already surfaced result['error'] correctly. The fix was ~20 LOC mirroring that pattern to the other two surfaces.
Some of the changes here would have reclassified final_response: None and the "(empty)" sentinel into the error lane, but those paths are intentional — they mean 'model produced nothing visible, but no backend error' and are owned by existing sentinel handlers (gateway/run.py:6480, run_agent.py:6716). We want to keep that distinction.

The new structured fields (error_kind, display_error, provider/model/base_url metadata, Ink runtime-row rendering) are a cleaner shape in the abstract, but it's a lot of surface area to maintain for the concrete symptom. If there's a second case where the narrow fix isn't enough, happy to revisit.

Appreciate the clean repro, the tests, and the thorough write-up — that's what let us confidently scope down.

@counterposition

…s empty (NousResearch#21245) When the provider rejects a request (e.g. invalid model slug like '--provider nous --model kimi-k2.6' where the valid slug is 'moonshotai/kimi-k2.6'), run_conversation() returns {failed: True, error: <detail>, final_response: None}. The TUI gateway and one-shot CLI mode both dropped the error on the floor and emitted an empty turn, so the user saw a blank response with no indication that anything went wrong. Mirror the interactive CLI's existing pattern (cli.py:9832): when final_response is empty AND (failed|partial) is set AND error is populated, surface 'Error: <detail>' as the visible text. Leaves the None-with-no-error path and the '(empty)' sentinel path untouched — an empty successful turn still renders empty, and existing sentinel handlers keep owning their lane. Reported by @counterposition in PR NousResearch#20873; taking a minimal fix rather than the broader structured-failure refactor proposed there.

@counterposition

…s empty (NousResearch#21245) When the provider rejects a request (e.g. invalid model slug like '--provider nous --model kimi-k2.6' where the valid slug is 'moonshotai/kimi-k2.6'), run_conversation() returns {failed: True, error: <detail>, final_response: None}. The TUI gateway and one-shot CLI mode both dropped the error on the floor and emitted an empty turn, so the user saw a blank response with no indication that anything went wrong. Mirror the interactive CLI's existing pattern (cli.py:9832): when final_response is empty AND (failed|partial) is set AND error is populated, surface 'Error: <detail>' as the visible text. Leaves the None-with-no-error path and the '(empty)' sentinel path untouched — an empty successful turn still renders empty, and existing sentinel handlers keep owning their lane. Reported by @counterposition in PR NousResearch#20873; taking a minimal fix rather than the broader structured-failure refactor proposed there.

@counterposition

…s empty (NousResearch#21245) When the provider rejects a request (e.g. invalid model slug like '--provider nous --model kimi-k2.6' where the valid slug is 'moonshotai/kimi-k2.6'), run_conversation() returns {failed: True, error: <detail>, final_response: None}. The TUI gateway and one-shot CLI mode both dropped the error on the floor and emitted an empty turn, so the user saw a blank response with no indication that anything went wrong. Mirror the interactive CLI's existing pattern (cli.py:9832): when final_response is empty AND (failed|partial) is set AND error is populated, surface 'Error: <detail>' as the visible text. Leaves the None-with-no-error path and the '(empty)' sentinel path untouched — an empty successful turn still renders empty, and existing sentinel handlers keep owning their lane. Reported by @counterposition in PR NousResearch#20873; taking a minimal fix rather than the broader structured-failure refactor proposed there.

@counterposition

…s empty (NousResearch#21245) When the provider rejects a request (e.g. invalid model slug like '--provider nous --model kimi-k2.6' where the valid slug is 'moonshotai/kimi-k2.6'), run_conversation() returns {failed: True, error: <detail>, final_response: None}. The TUI gateway and one-shot CLI mode both dropped the error on the floor and emitted an empty turn, so the user saw a blank response with no indication that anything went wrong. Mirror the interactive CLI's existing pattern (cli.py:9832): when final_response is empty AND (failed|partial) is set AND error is populated, surface 'Error: <detail>' as the visible text. Leaves the None-with-no-error path and the '(empty)' sentinel path untouched — an empty successful turn still renders empty, and existing sentinel handlers keep owning their lane. Reported by @counterposition in PR NousResearch#20873; taking a minimal fix rather than the broader structured-failure refactor proposed there.

@counterposition

…s empty (NousResearch#21245) When the provider rejects a request (e.g. invalid model slug like '--provider nous --model kimi-k2.6' where the valid slug is 'moonshotai/kimi-k2.6'), run_conversation() returns {failed: True, error: <detail>, final_response: None}. The TUI gateway and one-shot CLI mode both dropped the error on the floor and emitted an empty turn, so the user saw a blank response with no indication that anything went wrong. Mirror the interactive CLI's existing pattern (cli.py:9832): when final_response is empty AND (failed|partial) is set AND error is populated, surface 'Error: <detail>' as the visible text. Leaves the None-with-no-error path and the '(empty)' sentinel path untouched — an empty successful turn still renders empty, and existing sentinel handlers keep owning their lane. Reported by @counterposition in PR NousResearch#20873; taking a minimal fix rather than the broader structured-failure refactor proposed there.

@counterposition

…s empty (NousResearch#21245) When the provider rejects a request (e.g. invalid model slug like '--provider nous --model kimi-k2.6' where the valid slug is 'moonshotai/kimi-k2.6'), run_conversation() returns {failed: True, error: <detail>, final_response: None}. The TUI gateway and one-shot CLI mode both dropped the error on the floor and emitted an empty turn, so the user saw a blank response with no indication that anything went wrong. Mirror the interactive CLI's existing pattern (cli.py:9832): when final_response is empty AND (failed|partial) is set AND error is populated, surface 'Error: <detail>' as the visible text. Leaves the None-with-no-error path and the '(empty)' sentinel path untouched — an empty successful turn still renders empty, and existing sentinel handlers keep owning their lane. Reported by @counterposition in PR NousResearch#20873; taking a minimal fix rather than the broader structured-failure refactor proposed there.

counterposition added 3 commits May 6, 2026 14:59

fix: display structured agent result failures

cf5ce14

fix: satisfy agent result type lint

88e4d4e

fix: narrow failure metadata typing

19e262d

alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard comp/tui Terminal UI (ui-tui/ + tui_gateway/) P2 Medium — degraded but workaround exists labels May 6, 2026

fix: centralize failure metadata coercion

2504c1f

teknium1 mentioned this pull request May 7, 2026

fix(tui): surface backend error as visible text when final_response is empty #21245

Merged

teknium1 closed this in #21245 May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: display structured agent result failures#20873

fix: display structured agent result failures#20873
counterposition wants to merge 4 commits into
NousResearch:mainfrom
counterposition:fix/agent-result-error-display

counterposition commented May 6, 2026 •

edited

Loading

Uh oh!

teknium1 commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

counterposition commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why this PR is worth reviewer attention

What changed

Behavior before / after

Review focus

Verification

CI note

Uh oh!

teknium1 commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

counterposition commented May 6, 2026 •

edited

Loading