Skip to content

feat: trim hot-path rendering overhead in TUI frame assembly and diff pipeline#142

Merged
inureyes merged 3 commits into
mainfrom
feature/issue-138-render-hot-path
Apr 6, 2026
Merged

feat: trim hot-path rendering overhead in TUI frame assembly and diff pipeline#142
inureyes merged 3 commits into
mainfrom
feature/issue-138-render-hot-path

Conversation

@inureyes

@inureyes inureyes commented Apr 6, 2026

Copy link
Copy Markdown
Member

Summary

Reduces per-frame work in the TUI rendering pipeline, building on the architectural changes from #135, #136, and #137:

  • Rendering pipeline refinement: DifferentialRenderer accepts terminal dimensions from the caller to avoid redundant terminal::size() syscalls. Replaced FNV-1a full-content hashing with lightweight length-based identity check. Only flushes stdout when actual terminal writes occurred.
  • Terminal housekeeping reductions: Cursor is hidden once at session start instead of per-frame Hide/Show. Resize dimensions are passed directly to the renderer via update_dimensions().
  • Hot-path allocation cleanup: print_colored_text has a zero-allocation fast path for the common no-width case. truncate_to_width returns Cow<str> with zero-copy borrow when the string fits. Process renderer reuses scratch buffers across rows, borrows command strings instead of cloning, and uses Cow<str> for GPU fields. draw_bar/draw_bar_multi batch consecutive segments into single calls (~50x fewer terminal escape sequences per gauge). format_cpu_time returns Cow<str> for the common "0:00:00" case.
  • Measurement support: Added unit tests for BufferWriter throughput, DifferentialRenderer state management, and truncate_to_width allocation behavior.

Closes #138

Test plan

  • All 497 existing + new unit tests pass (cargo test)
  • cargo clippy -- -D warnings passes with zero warnings
  • cargo fmt --check passes
  • Manual verification: run all-smi locally and confirm visual correctness
  • Manual verification: resize terminal during monitoring and confirm no rendering artifacts
  • Manual verification: toggle help, switch tabs, scroll processes -- all modes render correctly
  • Manual verification: compare CPU usage before/after on a system with many GPUs/processes

… pipeline

Reduce per-frame work in the rendering pipeline after the architectural
changes in #135, #136, and #137:

Rendering pipeline refinement:
- DifferentialRenderer accepts terminal dimensions from the caller,
  eliminating redundant terminal::size() syscalls on every frame
- Replace FNV-1a full-content hashing with lightweight length-based
  identity check; per-line comparison already catches all differences
- Only flush stdout when actual terminal writes occurred
- Track change state to skip no-op flush operations

Terminal housekeeping reductions:
- Hide cursor once at session start instead of per-frame Hide/Show churn
- Pass resize dimensions directly to DifferentialRenderer via
  update_dimensions() on resize events
- Remove per-frame stdout acquisition for cursor operations

Hot-path allocation cleanup:
- print_colored_text: zero-allocation fast path when width is None
  (the common case), avoiding text.to_string() on every call
- truncate_to_width: returns Cow<str> with zero-copy borrow when the
  string already fits, with an ASCII fast-path for byte-level slicing
- Process renderer: reuse RowFormatter scratch buffer across rows,
  use write! into pre-sized buffers instead of per-field format! calls,
  borrow command string instead of cloning, use Cow<str> for GPU percent
  and GPU memory fields, reuse a single clear-line string for padding
- draw_bar/draw_bar_multi: batch consecutive same-type segments into
  single print_colored_text calls instead of per-character emission,
  reducing terminal escape sequence count by ~50x per gauge
- format_cpu_time: return Cow<str> to avoid allocation for common
  "0:00:00" case
- format_hostname_with_scroll: build scroll window directly into a
  pre-sized String without intermediate extended-hostname allocation
- BufferWriter: add reset() method for future frame-to-frame reuse

Measurement support:
- Add unit tests for BufferWriter throughput, DifferentialRenderer state
  management, and truncate_to_width allocation behavior
- Add timing-based throughput tests for the render path
@inureyes inureyes added type:performance Performance improvements priority:medium Medium priority issue status:review Under review labels Apr 6, 2026
…nfinite loop

- DifferentialRenderer: remove dead content_bytes fast-check that never
  short-circuited; the per-line comparison is already the authoritative
  change-detection mechanism and is O(1) for identical strings
- format_hostname_with_scroll: add ASCII guard before byte-level indexing
  to prevent character corruption on non-ASCII hostnames
- draw_bar_multi: guarantee forward progress in segment batching loop
  when text overlay is behind current position
@inureyes

inureyes commented Apr 6, 2026

Copy link
Copy Markdown
Member Author

Code Review Summary

Reviewed all 7 changed files in this PR (+679/-212 lines across the TUI rendering pipeline).

Findings Addressed (3 fixes committed)

# Severity File Finding Status
1 MEDIUM src/ui/buffer.rs Dead identity check in DifferentialRenderer: The replacement of FNV-1a hashing introduced an if/else block that computes content_bytes (iterating first 64 bytes) but both branches fall through to the same per-line loop. The check never short-circuits, wasting CPU cycles every frame. Removed the dead check and previous_content_bytes field entirely -- the per-line string comparison is already the authoritative change-detection mechanism and is O(1) for identical strings (Rust checks length first). Fixed
2 MEDIUM src/ui/renderers/gpu_renderer.rs Non-ASCII hostname corruption in format_hostname_with_scroll: The optimization replaced safe .chars() iteration with hostname.as_bytes()[idx] as char, which only works for ASCII. Non-ASCII UTF-8 multibyte hostnames would produce corrupted characters. Added an is_ascii() guard that uses the fast byte path for ASCII hostnames (the common case per RFC 952) and falls back to safe char iteration otherwise. Fixed
3 LOW src/ui/widgets.rs Potential infinite loop in draw_bar_multi: When text_pos <= pos (text overlay already emitted), run_end = end.min(text_pos) could be at or before pos, causing run_len = 0 and then pos = end which might not advance. Fixed by checking whether text_pos > pos before clamping, and using pos = end.max(pos + 1) in the fallback to guarantee forward progress. Fixed

Items Not Fixed (pre-existing, not introduced by this PR)

# Severity File Finding Status
4 LOW src/ui/process_renderer.rs Misleading format_cpu_time logic for hours=0: When hours == 0, the code computes minutes / 60 (always 0) and minutes % 60 (equals minutes). The output "0:MM:SS" is visually correct but the math is misleading. This is pre-existing, not introduced by this PR. Noted

Overall Assessment

The PR's optimizations are sound and well-targeted:

  • Cow<str> usage: Correct throughout. Lifetimes are properly scoped -- borrowed variants reference data that outlives the Cow, and into_owned() is called only where needed (chrome.rs).
  • Buffer reuse: Safe. RowFormatter and row_buf are cleared between rows; no stale data can leak between frames.
  • Differential renderer correctness: After the fix, the per-line comparison is the sole change-detection mechanism. Identical lines are O(1) due to Rust's string comparison checking length first. No regression risk from removing the hash.
  • Progress bar batching: The segment batching in draw_bar and draw_bar_multi correctly reduces escape sequences from O(width) to O(segments).
  • truncate_to_width returning Cow<str>: Zero-copy for the common case (string already fits), with an ASCII fast path for byte-level slicing. All callers handle the Cow correctly.
  • Session-level cursor management: Hiding cursor once at session start and relying on TerminalManager::drop() for restoration is correct and eliminates per-frame churn.
  • No security vulnerabilities identified in the changed code paths.

All 497+ tests pass. Clippy clean. Formatting clean.

Add test coverage for the optimized code paths introduced in this PR:
- process_renderer: write_memory_size correctness, format_cpu_time Cow
  allocation behavior (zero/long-running borrows vs owned), RowFormatter
  scratch buffer reuse, and throughput measurements for the zero-seconds
  hot path
- widgets: draw_bar and draw_bar_multi correctness (label, brackets,
  percentage text, custom text, long-label trimming) and escape-sequence
  batching verification confirming the ~50x reduction in terminal writes
  per gauge; includes throughput measurements for both functions
@inureyes

inureyes commented Apr 6, 2026

Copy link
Copy Markdown
Member Author

PR Finalization Complete

Summary

  • Tests: Added 31 new unit tests across 2 previously untested files:
    • src/ui/process_renderer.rs: 16 tests covering write_memory_size (7 correctness cases including zero, bytes, KB/MB/GB/TB, and buffer reuse), format_cpu_time (6 cases verifying Cow::Borrowed for zero and long-running paths vs Cow::Owned for active times), RowFormatter scratch buffer reuse (2 cases), and throughput measurements for both hot paths
    • src/ui/widgets.rs: 15 tests covering draw_bar (7 correctness cases: output non-empty, label presence, bracket presence, percentage text, custom text override, long-label trimming, and escape-sequence batch count verification) and draw_bar_multi (5 correctness cases) with throughput measurements for both functions
  • Lint/Format: cargo clippy --features cli -- -D warnings and cargo fmt --check both pass with zero issues
  • Test count: 199 -> 230 unit tests (all passing)

@inureyes inureyes added status:done Completed and removed status:review Under review labels Apr 6, 2026
@inureyes inureyes merged commit 9509e14 into main Apr 6, 2026
2 checks passed
@inureyes inureyes deleted the feature/issue-138-render-hot-path branch April 6, 2026 05:52
@inureyes inureyes self-assigned this Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:medium Medium priority issue status:done Completed type:performance Performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Trim hot-path rendering overhead in the TUI frame assembly and diff pipeline

1 participant