fix: display structured agent result failures#20873
Conversation
…s empty (#21245) When the provider rejects a request (e.g. invalid model slug like '--provider nous --model kimi-k2.6' where the valid slug is 'moonshotai/kimi-k2.6'), run_conversation() returns {failed: True, error: <detail>, final_response: None}. The TUI gateway and one-shot CLI mode both dropped the error on the floor and emitted an empty turn, so the user saw a blank response with no indication that anything went wrong. Mirror the interactive CLI's existing pattern (cli.py:9832): when final_response is empty AND (failed|partial) is set AND error is populated, surface 'Error: <detail>' as the visible text. Leaves the None-with-no-error path and the '(empty)' sentinel path untouched — an empty successful turn still renders empty, and existing sentinel handlers keep owning their lane. Reported by @counterposition in PR #20873; taking a minimal fix rather than the broader structured-failure refactor proposed there.
|
Thanks for the clear bug report and the repro — that was what made this easy to act on. We went with a narrower fix in #21245 (merged as 6e46f99) rather than the structured-failure refactor here. Two reasons:
The new structured fields ( Appreciate the clean repro, the tests, and the thorough write-up — that's what let us confidently scope down. |
…s empty (NousResearch#21245) When the provider rejects a request (e.g. invalid model slug like '--provider nous --model kimi-k2.6' where the valid slug is 'moonshotai/kimi-k2.6'), run_conversation() returns {failed: True, error: <detail>, final_response: None}. The TUI gateway and one-shot CLI mode both dropped the error on the floor and emitted an empty turn, so the user saw a blank response with no indication that anything went wrong. Mirror the interactive CLI's existing pattern (cli.py:9832): when final_response is empty AND (failed|partial) is set AND error is populated, surface 'Error: <detail>' as the visible text. Leaves the None-with-no-error path and the '(empty)' sentinel path untouched — an empty successful turn still renders empty, and existing sentinel handlers keep owning their lane. Reported by @counterposition in PR NousResearch#20873; taking a minimal fix rather than the broader structured-failure refactor proposed there.
…s empty (NousResearch#21245) When the provider rejects a request (e.g. invalid model slug like '--provider nous --model kimi-k2.6' where the valid slug is 'moonshotai/kimi-k2.6'), run_conversation() returns {failed: True, error: <detail>, final_response: None}. The TUI gateway and one-shot CLI mode both dropped the error on the floor and emitted an empty turn, so the user saw a blank response with no indication that anything went wrong. Mirror the interactive CLI's existing pattern (cli.py:9832): when final_response is empty AND (failed|partial) is set AND error is populated, surface 'Error: <detail>' as the visible text. Leaves the None-with-no-error path and the '(empty)' sentinel path untouched — an empty successful turn still renders empty, and existing sentinel handlers keep owning their lane. Reported by @counterposition in PR NousResearch#20873; taking a minimal fix rather than the broader structured-failure refactor proposed there.
…s empty (NousResearch#21245) When the provider rejects a request (e.g. invalid model slug like '--provider nous --model kimi-k2.6' where the valid slug is 'moonshotai/kimi-k2.6'), run_conversation() returns {failed: True, error: <detail>, final_response: None}. The TUI gateway and one-shot CLI mode both dropped the error on the floor and emitted an empty turn, so the user saw a blank response with no indication that anything went wrong. Mirror the interactive CLI's existing pattern (cli.py:9832): when final_response is empty AND (failed|partial) is set AND error is populated, surface 'Error: <detail>' as the visible text. Leaves the None-with-no-error path and the '(empty)' sentinel path untouched — an empty successful turn still renders empty, and existing sentinel handlers keep owning their lane. Reported by @counterposition in PR NousResearch#20873; taking a minimal fix rather than the broader structured-failure refactor proposed there.
…s empty (NousResearch#21245) When the provider rejects a request (e.g. invalid model slug like '--provider nous --model kimi-k2.6' where the valid slug is 'moonshotai/kimi-k2.6'), run_conversation() returns {failed: True, error: <detail>, final_response: None}. The TUI gateway and one-shot CLI mode both dropped the error on the floor and emitted an empty turn, so the user saw a blank response with no indication that anything went wrong. Mirror the interactive CLI's existing pattern (cli.py:9832): when final_response is empty AND (failed|partial) is set AND error is populated, surface 'Error: <detail>' as the visible text. Leaves the None-with-no-error path and the '(empty)' sentinel path untouched — an empty successful turn still renders empty, and existing sentinel handlers keep owning their lane. Reported by @counterposition in PR NousResearch#20873; taking a minimal fix rather than the broader structured-failure refactor proposed there.
…s empty (NousResearch#21245) When the provider rejects a request (e.g. invalid model slug like '--provider nous --model kimi-k2.6' where the valid slug is 'moonshotai/kimi-k2.6'), run_conversation() returns {failed: True, error: <detail>, final_response: None}. The TUI gateway and one-shot CLI mode both dropped the error on the floor and emitted an empty turn, so the user saw a blank response with no indication that anything went wrong. Mirror the interactive CLI's existing pattern (cli.py:9832): when final_response is empty AND (failed|partial) is set AND error is populated, surface 'Error: <detail>' as the visible text. Leaves the None-with-no-error path and the '(empty)' sentinel path untouched — an empty successful turn still renders empty, and existing sentinel handlers keep owning their lane. Reported by @counterposition in PR NousResearch#20873; taking a minimal fix rather than the broader structured-failure refactor proposed there.
…s empty (NousResearch#21245) When the provider rejects a request (e.g. invalid model slug like '--provider nous --model kimi-k2.6' where the valid slug is 'moonshotai/kimi-k2.6'), run_conversation() returns {failed: True, error: <detail>, final_response: None}. The TUI gateway and one-shot CLI mode both dropped the error on the floor and emitted an empty turn, so the user saw a blank response with no indication that anything went wrong. Mirror the interactive CLI's existing pattern (cli.py:9832): when final_response is empty AND (failed|partial) is set AND error is populated, surface 'Error: <detail>' as the visible text. Leaves the None-with-no-error path and the '(empty)' sentinel path untouched — an empty successful turn still renders empty, and existing sentinel handlers keep owning their lane. Reported by @counterposition in PR NousResearch#20873; taking a minimal fix rather than the broader structured-failure refactor proposed there.
Why this PR is worth reviewer attention
Today, a provider/runtime failure can look like Hermes simply produced no answer. The concrete failure mode that motivated this: an invalid model slug caused
AIAgent.run_conversation()to returnfinal_response: None, but the TUI rendered that as a blank assistant turn. The useful error was only discoverable by digging through logs/request dumps.This PR makes those failures visible and actionable across the main user surfaces. It gives Hermes a structured “the agent failed/incompleted before answering” result path, renders that as runtime/system UI instead of assistant prose, and redacts sensitive text before display.
The value is simple: when the model/provider rejects a request, users should see the reason immediately instead of debugging a blank response.
What changed
agent.run_result_displayhelpers to normalizerun_conversation()result envelopes into:complete,partial,error, orinterruptedrun_agent.pyto structured failure/partial envelopes withfinal_response: None,display_error,error_kind, status code, and provider/model/base URL metadata.Behavior before / after
Before:
final_response: None.After:
display_error+ structured metadata.Review focus
The main design choice is intentional: this does not stuff runtime failures into
final_response. The model did not answer. Instead,final_responsestaysNonefor no-output failures, and UI boundaries usedisplay_error/status metadata to show the failure.Most relevant files:
agent/run_result_display.pyrun_agent.pytui_gateway/server.pyui-tui/src/app/turnController.tscli.pyVerification
Latest local verification after addressing the adversarial review finding about
_make_failure_result(...)metadata argument types:uv run ty check run_agent.py --output-format concisestill reports many baselinetyissues in this file, but no remaining diagnostics mention_make_failure_resultor the reported call sites after the helper now accepts object-like provider/model/base URL values and centralizes coercion/redaction.HERMES_TEST_NO_SYNC=1 scripts/run_tests.sh tests/agent/test_run_result_display.py tests/run_agent/test_run_agent.py::TestExecuteToolCalls::test_failure_envelope_redacts_sensitive_error_text tests/run_agent/test_run_agent.py::TestExecuteToolCalls::test_non_retryable_client_error_returns_structured_failure_envelope -q→ 15 passedpython3 -m py_compile agent/run_result_display.py run_agent.py && git diff --check→ passedEarlier broader verification on this branch:
HERMES_TEST_NO_SYNC=1 scripts/run_tests.sh tests/agent/test_run_result_display.py tests/run_agent/test_run_agent.py tests/cli/test_cli_init.py tests/cli/test_cli_background_tui_refresh.py tests/test_tui_gateway_server.py::test_prompt_submit_failed_empty_result_emits_error_text tests/test_tui_gateway_server.py::test_prompt_submit_partial_empty_result_emits_partial_text -q→ 368 passednpm --prefix ui-tui test -- src/__tests__/createGatewayEventHandler.test.ts→ 42 passednpm --prefix ui-tui run type-check→ passednpm --prefix ui-tui run build→ passedCI note
The red
ruff + ty diffcheck appears to be a workflow/base issue, not a blocking code-lint failure in this PR:Post / update PR comment, which exits with GitHub API 403:Resource not accessible by integrationwhen trying to write the lint summary comment from this forked PR.ruffas clean on both HEAD and base.tydiagnostics are surfaced as warnings and “this check never fails the build.”So reviewers should treat the red Lint check as a CI/comment-permission problem unless the workflow is changed or re-run in a context with comment-write permissions.