Skip to content

TUI: surface failure signals in collapsed run_command cards and clarify full-output access #1496

@llr-selfgit

Description

@llr-selfgit

Problem

While testing reasonix code on a few small local debugging tasks, I noticed a TUI affordance issue around completed run_command cards.

The green ✓ run_command marker seems to mean that the Reasonix tool invocation completed and returned to the loop. That meaning makes sense. However, in practice it is easy to read the green check as “the underlying shell command or test passed”, especially when the command output contains a failure but the visible collapsed card only shows trailing logs.

The final diagnosis was usually correct. The issue is mostly with what the user can quickly see in the TUI while the task is running.

A related point: collapsed cards show use /tool to read full, but in my manual session the /tool path was not obvious from that state. It may be that I used it incorrectly, but the full-output affordance was not clear while reading the card.

Observed behavior

Case 1: Python unittest failure with noisy trailing logs

The test failure was:

AssertionError: 300 != 800 : third retry should wait 800ms

The visible command card showed:

✓ run_command python3 test_retry_policy.py
⋮ 43 earlier lines (use /tool to read full)
cleanup: checked worker shard 27/30
cleanup: checked worker shard 28/30
cleanup: checked worker shard 29/30
cleanup: checked worker shard 30/30

The final summary identified the correct failure, but the visible card looked like a completed/successful command and only showed cleanup tail lines. The assertion line was collapsed above the visible excerpt.

Case 2: Node assertion failure

The test failure was:

applyCoupon(12000, "VIP25")
expected: 9000
actual:   10200

The final summary correctly identified the issue in src/cart.js, but the command card showed mostly the tail of the assertion output:

✓ run_command node test.mjs
⋮ 19 earlier lines (use /tool to read full)
diff: 'simple'
}
Node.js v22.22.0

Again, the diagnosis was correct, but the card did not make the failure signal easy to see.

Expected

  • Keep ✓ run_command if it means “tool invocation completed”, but make command failure harder to confuse with command success.
  • If the subprocess exits non-zero, show that near the card header or first visible output line.
  • If output is collapsed, prefer high-signal failure lines over only the final tail.
  • The use /tool to read full hint should lead to an obvious full-output path from the state where it appears.

Possible improvements

These are suggestions, not a required design.

  1. Clarify tool completion vs command result.

For example:

✓ run_command completed · exit 1

or:

✓ tool completed
✗ command exited 1
  1. Use failure-aware collapsed excerpts.

When output is collapsed, prefer showing:

  • exit status,
  • first assertion/error line,
  • nearby mismatch line,
  • then the final tail.

Example:

✗ run_command node test.mjs · exit 1
AssertionError: VIP25 should reduce a 12000 cent cart to 9000 cents
actual: 10200
expected: 9000
⋮ 19 lines collapsed
  1. Clarify full-output access.

Possible options:

  • make sure /tool works from the state where the hint appears;
  • update the hint if the intended interaction is different;
  • allow selecting/expanding the collapsed card with Enter;
  • mention the correct full-output path in /keys.

Reproduction

Environment:

Reasonix version: 0.48.0
Node version: v22.22.0
OS: macOS
Shell: zsh
Terminal app: Terminal.app
Model: deepseek-v4-pro

Python repro:

  1. Create a small unittest project where one assertion fails early, followed by many cleanup/log lines.
  2. Run:
reasonix code .
  1. Ask:
Help me check why this Python unittest project is failing. Please run the tests first, analyze the output, and do not edit files.
  1. Observe that the final diagnosis can be correct, but the visible command card may show only trailing cleanup lines while the assertion is collapsed.

Node repro:

  1. Create a small Node project with a failing assertion.
  2. Run:
reasonix code .
  1. Ask:
Help me check why this Node project test is failing. Please run the project tests first, analyze the reason, and do not edit files.
  1. Observe that the final diagnosis can be correct, but the visible command card may collapse the actual mismatch and show only the tail of the assertion output.

Full-output repro:

  1. Produce a collapsed command card that says use /tool to read full.
  2. Try to use /tool from the TUI to inspect the full output.
  3. Observe whether the full-output path is available and obvious from that state.

Related note

I also noticed that normal terminal text selection is not obvious when mouse handling is enabled. This may already be covered by existing copy/mouse discussions and recent fixes, so I am not treating it as the main issue here. The main request is about failure visibility in collapsed run_command cards and the /tool full-output affordance.

How this could be tested

  • Add a rendering/snapshot test for a command result with:
    • non-zero exit code,
    • an assertion failure near the top,
    • noisy trailing cleanup lines.
  • Verify the collapsed card includes the failure line, not only the tail.
  • Add a Node assertion fixture where useful lines are above the final Node.js v... tail.
  • Manual smoke test:
    • run reasonix code .;
    • trigger Python unittest and Node assertion failures;
    • confirm the TUI card surfaces the failure line without needing to inspect the full transcript.
  • Manual full-output test:
    • trigger a collapsed tool card;
    • confirm the advertised /tool or equivalent full-output path works from that state.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions