Skip to content

fix: display structured agent result failures#20873

Closed
counterposition wants to merge 4 commits into
NousResearch:mainfrom
counterposition:fix/agent-result-error-display
Closed

fix: display structured agent result failures#20873
counterposition wants to merge 4 commits into
NousResearch:mainfrom
counterposition:fix/agent-result-error-display

Conversation

@counterposition

@counterposition counterposition commented May 6, 2026

Copy link
Copy Markdown
Contributor

Why this PR is worth reviewer attention

Today, a provider/runtime failure can look like Hermes simply produced no answer. The concrete failure mode that motivated this: an invalid model slug caused AIAgent.run_conversation() to return final_response: None, but the TUI rendered that as a blank assistant turn. The useful error was only discoverable by digging through logs/request dumps.

This PR makes those failures visible and actionable across the main user surfaces. It gives Hermes a structured “the agent failed/incompleted before answering” result path, renders that as runtime/system UI instead of assistant prose, and redacts sensitive text before display.

The value is simple: when the model/provider rejects a request, users should see the reason immediately instead of debugging a blank response.

What changed

  • Added agent.run_result_display helpers to normalize run_conversation() result envelopes into:
    • status: complete, partial, error, or interrupted
    • visible display text
    • display-safe error metadata
    • redacted runtime error text
  • Converted key no-output runtime/provider branches in run_agent.py to structured failure/partial envelopes with final_response: None, display_error, error_kind, status code, and provider/model/base URL metadata.
  • Wired the classic CLI, TUI gateway, and TUI frontend to render these failures visibly without pretending they were assistant-authored messages.
  • Added regression coverage for helper behavior, source envelopes, redaction, TUI gateway payloads, frontend rendering, and CLI display behavior.

Behavior before / after

Before:

  • Provider rejects request, e.g. invalid model ID.
  • Agent returns final_response: None.
  • TUI/CLI paths may show a blank/no-op response.
  • User has to inspect logs to discover the real error.

After:

  • Provider/runtime failure remains semantically distinct from assistant output.
  • Result carries display_error + structured metadata.
  • TUI renders a runtime/system error row, not assistant prose.
  • CLI shows the failure text at the presentation boundary.
  • Sensitive text is redacted both when envelopes are produced and when legacy/raw envelopes are displayed.

Review focus

The main design choice is intentional: this does not stuff runtime failures into final_response. The model did not answer. Instead, final_response stays None for no-output failures, and UI boundaries use display_error/status metadata to show the failure.

Most relevant files:

  • agent/run_result_display.py
  • run_agent.py
  • tui_gateway/server.py
  • ui-tui/src/app/turnController.ts
  • cli.py

Verification

Latest local verification after addressing the adversarial review finding about _make_failure_result(...) metadata argument types:

  • uv run ty check run_agent.py --output-format concise still reports many baseline ty issues in this file, but no remaining diagnostics mention _make_failure_result or the reported call sites after the helper now accepts object-like provider/model/base URL values and centralizes coercion/redaction.
  • HERMES_TEST_NO_SYNC=1 scripts/run_tests.sh tests/agent/test_run_result_display.py tests/run_agent/test_run_agent.py::TestExecuteToolCalls::test_failure_envelope_redacts_sensitive_error_text tests/run_agent/test_run_agent.py::TestExecuteToolCalls::test_non_retryable_client_error_returns_structured_failure_envelope -q → 15 passed
  • python3 -m py_compile agent/run_result_display.py run_agent.py && git diff --check → passed

Earlier broader verification on this branch:

  • HERMES_TEST_NO_SYNC=1 scripts/run_tests.sh tests/agent/test_run_result_display.py tests/run_agent/test_run_agent.py tests/cli/test_cli_init.py tests/cli/test_cli_background_tui_refresh.py tests/test_tui_gateway_server.py::test_prompt_submit_failed_empty_result_emits_error_text tests/test_tui_gateway_server.py::test_prompt_submit_partial_empty_result_emits_partial_text -q → 368 passed
  • npm --prefix ui-tui test -- src/__tests__/createGatewayEventHandler.test.ts → 42 passed
  • npm --prefix ui-tui run type-check → passed
  • npm --prefix ui-tui run build → passed
  • Added-line security scan: no hardcoded secret assignment, shell injection, dangerous eval/exec, unsafe pickle, or SQL string-formatting findings
  • Independent pre-commit review: safe to commit; no blocking issues found

CI note

The red ruff + ty diff check appears to be a workflow/base issue, not a blocking code-lint failure in this PR:

  • The failing step is Post / update PR comment, which exits with GitHub API 403: Resource not accessible by integration when trying to write the lint summary comment from this forked PR.
  • The lint summary itself reports ruff as clean on both HEAD and base.
  • The workflow’s own summary text says ty diagnostics are surfaced as warnings and “this check never fails the build.”

So reviewers should treat the red Lint check as a CI/comment-permission problem unless the workflow is changed or re-run in a context with comment-write permissions.

@alt-glitch alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard comp/tui Terminal UI (ui-tui/ + tui_gateway/) P2 Medium — degraded but workaround exists labels May 6, 2026
teknium1 added a commit that referenced this pull request May 7, 2026
…s empty (#21245)

When the provider rejects a request (e.g. invalid model slug like
'--provider nous --model kimi-k2.6' where the valid slug is
'moonshotai/kimi-k2.6'), run_conversation() returns
{failed: True, error: <detail>, final_response: None}. The TUI gateway
and one-shot CLI mode both dropped the error on the floor and emitted
an empty turn, so the user saw a blank response with no indication
that anything went wrong.

Mirror the interactive CLI's existing pattern (cli.py:9832): when
final_response is empty AND (failed|partial) is set AND error is
populated, surface 'Error: <detail>' as the visible text. Leaves
the None-with-no-error path and the '(empty)' sentinel path
untouched — an empty successful turn still renders empty, and
existing sentinel handlers keep owning their lane.

Reported by @counterposition in PR #20873; taking a minimal fix
rather than the broader structured-failure refactor proposed there.
@teknium1

teknium1 commented May 7, 2026

Copy link
Copy Markdown
Contributor

Thanks for the clear bug report and the repro — that was what made this easy to act on.

We went with a narrower fix in #21245 (merged as 6e46f99) rather than the structured-failure refactor here. Two reasons:

  1. The reported symptom (blank response on --provider nous --model kimi-k2.6) is specifically a TUI + one-shot CLI issue — the interactive CLI at cli.py:9832 already surfaced result['error'] correctly. The fix was ~20 LOC mirroring that pattern to the other two surfaces.

  2. Some of the changes here would have reclassified final_response: None and the "(empty)" sentinel into the error lane, but those paths are intentional — they mean 'model produced nothing visible, but no backend error' and are owned by existing sentinel handlers (gateway/run.py:6480, run_agent.py:6716). We want to keep that distinction.

The new structured fields (error_kind, display_error, provider/model/base_url metadata, Ink runtime-row rendering) are a cleaner shape in the abstract, but it's a lot of surface area to maintain for the concrete symptom. If there's a second case where the narrow fix isn't enough, happy to revisit.

Appreciate the clean repro, the tests, and the thorough write-up — that's what let us confidently scope down.

RationallyPrime pushed a commit to RationallyPrime/hermes-agent that referenced this pull request May 8, 2026
…s empty (NousResearch#21245)

When the provider rejects a request (e.g. invalid model slug like
'--provider nous --model kimi-k2.6' where the valid slug is
'moonshotai/kimi-k2.6'), run_conversation() returns
{failed: True, error: <detail>, final_response: None}. The TUI gateway
and one-shot CLI mode both dropped the error on the floor and emitted
an empty turn, so the user saw a blank response with no indication
that anything went wrong.

Mirror the interactive CLI's existing pattern (cli.py:9832): when
final_response is empty AND (failed|partial) is set AND error is
populated, surface 'Error: <detail>' as the visible text. Leaves
the None-with-no-error path and the '(empty)' sentinel path
untouched — an empty successful turn still renders empty, and
existing sentinel handlers keep owning their lane.

Reported by @counterposition in PR NousResearch#20873; taking a minimal fix
rather than the broader structured-failure refactor proposed there.
rmulligan pushed a commit to rmulligan/hermes-agent that referenced this pull request May 11, 2026
…s empty (NousResearch#21245)

When the provider rejects a request (e.g. invalid model slug like
'--provider nous --model kimi-k2.6' where the valid slug is
'moonshotai/kimi-k2.6'), run_conversation() returns
{failed: True, error: <detail>, final_response: None}. The TUI gateway
and one-shot CLI mode both dropped the error on the floor and emitted
an empty turn, so the user saw a blank response with no indication
that anything went wrong.

Mirror the interactive CLI's existing pattern (cli.py:9832): when
final_response is empty AND (failed|partial) is set AND error is
populated, surface 'Error: <detail>' as the visible text. Leaves
the None-with-no-error path and the '(empty)' sentinel path
untouched — an empty successful turn still renders empty, and
existing sentinel handlers keep owning their lane.

Reported by @counterposition in PR NousResearch#20873; taking a minimal fix
rather than the broader structured-failure refactor proposed there.
JinyuID pushed a commit to JinyuID/hermes-agent that referenced this pull request May 11, 2026
…s empty (NousResearch#21245)

When the provider rejects a request (e.g. invalid model slug like
'--provider nous --model kimi-k2.6' where the valid slug is
'moonshotai/kimi-k2.6'), run_conversation() returns
{failed: True, error: <detail>, final_response: None}. The TUI gateway
and one-shot CLI mode both dropped the error on the floor and emitted
an empty turn, so the user saw a blank response with no indication
that anything went wrong.

Mirror the interactive CLI's existing pattern (cli.py:9832): when
final_response is empty AND (failed|partial) is set AND error is
populated, surface 'Error: <detail>' as the visible text. Leaves
the None-with-no-error path and the '(empty)' sentinel path
untouched — an empty successful turn still renders empty, and
existing sentinel handlers keep owning their lane.

Reported by @counterposition in PR NousResearch#20873; taking a minimal fix
rather than the broader structured-failure refactor proposed there.
jsboige pushed a commit to jsboige/hermes-agent that referenced this pull request May 14, 2026
…s empty (NousResearch#21245)

When the provider rejects a request (e.g. invalid model slug like
'--provider nous --model kimi-k2.6' where the valid slug is
'moonshotai/kimi-k2.6'), run_conversation() returns
{failed: True, error: <detail>, final_response: None}. The TUI gateway
and one-shot CLI mode both dropped the error on the floor and emitted
an empty turn, so the user saw a blank response with no indication
that anything went wrong.

Mirror the interactive CLI's existing pattern (cli.py:9832): when
final_response is empty AND (failed|partial) is set AND error is
populated, surface 'Error: <detail>' as the visible text. Leaves
the None-with-no-error path and the '(empty)' sentinel path
untouched — an empty successful turn still renders empty, and
existing sentinel handlers keep owning their lane.

Reported by @counterposition in PR NousResearch#20873; taking a minimal fix
rather than the broader structured-failure refactor proposed there.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…s empty (NousResearch#21245)

When the provider rejects a request (e.g. invalid model slug like
'--provider nous --model kimi-k2.6' where the valid slug is
'moonshotai/kimi-k2.6'), run_conversation() returns
{failed: True, error: <detail>, final_response: None}. The TUI gateway
and one-shot CLI mode both dropped the error on the floor and emitted
an empty turn, so the user saw a blank response with no indication
that anything went wrong.

Mirror the interactive CLI's existing pattern (cli.py:9832): when
final_response is empty AND (failed|partial) is set AND error is
populated, surface 'Error: <detail>' as the visible text. Leaves
the None-with-no-error path and the '(empty)' sentinel path
untouched — an empty successful turn still renders empty, and
existing sentinel handlers keep owning their lane.

Reported by @counterposition in PR NousResearch#20873; taking a minimal fix
rather than the broader structured-failure refactor proposed there.
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…s empty (NousResearch#21245)

When the provider rejects a request (e.g. invalid model slug like
'--provider nous --model kimi-k2.6' where the valid slug is
'moonshotai/kimi-k2.6'), run_conversation() returns
{failed: True, error: <detail>, final_response: None}. The TUI gateway
and one-shot CLI mode both dropped the error on the floor and emitted
an empty turn, so the user saw a blank response with no indication
that anything went wrong.

Mirror the interactive CLI's existing pattern (cli.py:9832): when
final_response is empty AND (failed|partial) is set AND error is
populated, surface 'Error: <detail>' as the visible text. Leaves
the None-with-no-error path and the '(empty)' sentinel path
untouched — an empty successful turn still renders empty, and
existing sentinel handlers keep owning their lane.

Reported by @counterposition in PR NousResearch#20873; taking a minimal fix
rather than the broader structured-failure refactor proposed there.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard comp/tui Terminal UI (ui-tui/ + tui_gateway/) P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants