Skip to content

Feature: CLI Status Bar & Token/Cost Tracking — Persistent Context Window Visibility #683

@teknium1

Description

@teknium1

Overview

Add a persistent status bar to the Hermes CLI that displays the current model, token usage, context window fullness, session duration, and estimated cost. This is the single highest-impact, lowest-effort CLI improvement available.

Every modern AI coding CLI (Aider, Claude Code, Toad) shows token/cost information. Hermes shows nothing — users have no visibility into how full their context window is, what model they're using, or how much a session costs until they hit a wall (context overflow, unexpected bill).

Split from #504 for atomicity. The companion feature (rich rendering + navigation) is tracked in #684.


Research Findings

What Competitors Show

Aider: Real-time token/cost display per interaction. Shows tokens used per file in context.

Claude Code: Token counter in the prompt bar. Context window percentage visible at all times. /usage command for detailed breakdown.

Toad CLI: Model indicator, streaming status, connection health in the status line.

The Pattern

A thin status bar (1-2 lines) that updates after each interaction. Always visible. Color-coded thresholds for context fullness.


Current State in Hermes Agent

No status bar exists. The current prompt_toolkit layout (cli.py ~line 3289) is:

HSplit([
    Window(height=0),        # spacer
    sudo_widget,             # conditional sudo password
    approval_widget,         # conditional approval
    clarify_widget,          # conditional clarify
    spacer,                  # flexible space
    input_rule_top,          # bronze ─ rule
    image_bar,               # clipboard image badges
    input_area,              # TextArea
    input_rule_bot,          # bronze ─ rule
    CompletionsMenu,         # autocomplete
])

Token data IS available — the OpenAI API response includes a usage field with prompt_tokens, completion_tokens, and total_tokens. This data just isn't surfaced to the user.

Model info IS available — stored in session state, displayed in the banner at startup, but not persistently visible.


Implementation Plan

Skill vs. Tool Classification

This is a core codebase change to cli.py. It requires modifications to the prompt_toolkit layout and a data pipeline from API responses to the status bar display.

Specification

Status bar layout (1 line, above the input area):

 ⚕ claude-sonnet-4-20250514 │ Tokens: 12,450 / 200K │ [██████░░░░] 62% │ Cost: $0.23 │ 14m

Components:

  • Model name: current model (truncated if needed)
  • Token count: prompt + completion tokens / max context
  • Context bar: visual [██████░░░░] with color coding:
    • Green: < 50%
    • Yellow: 50-80%
    • Red: > 80%
    • Blinking red: > 95% (imminent overflow)
  • Cost estimate: cumulative session cost based on per-token pricing
  • Duration: session elapsed time

Data pipeline:

  1. After each API call, extract usage from response
  2. Update a shared state dict (_status_state)
  3. Trigger app.invalidate() to repaint the status bar
  4. Cost calculation: prompt_tokens * input_price + completion_tokens * output_price

Model pricing (configurable in config.yaml):

pricing:
  claude-sonnet-4-20250514:
    input: 3.0    # per million tokens
    output: 15.0
  gpt-4o:
    input: 2.5
    output: 10.0
  # fallback for unknown models
  default:
    input: 1.0
    output: 3.0

/usage command: Detailed breakdown:

Session Usage Report
━━━━━━━━━━━━━━━━━━━
Model:       claude-sonnet-4-20250514
Duration:    14m 32s
Turns:       7

Tokens:
  Prompt:     10,230 (input)
  Completion:  2,220 (output)
  Total:      12,450
  Context:    62% of 200,000

Cost:
  Input:      $0.031
  Output:     $0.033
  Total:      $0.064

Deliverables

  • Status bar Window in prompt_toolkit layout (above input_rule_top)
  • FormattedTextControl with dynamic content from _status_state
  • Token extraction from API response usage field
  • Context window max lookup per model (from model_metadata.py)
  • Color-coded context fullness bar
  • Cost estimation with configurable per-model pricing
  • Session timer (start time → elapsed)
  • /usage command with detailed breakdown
  • Compact mode handling (abbreviate or hide status bar in narrow terminals)
  • Tests for token tracking, cost calculation, display formatting

Pros & Cons

Pros

  • Highest impact-to-effort ratio of any CLI improvement
  • Prevents surprise context overflow (users see it coming)
  • Cost visibility saves money (users learn which prompts are expensive)
  • Small, self-contained change (one new Window in the HSplit)
  • No refactoring required — purely additive
  • Can ship in 1-2 days

Cons / Risks

  • Token counts from server are delayed (reported after completion, not during streaming)
  • Cost estimation requires maintaining a pricing table (could go stale)
  • Status bar takes 1 line of vertical space (mitigated by compact mode)
  • Some terminals may render the color bar poorly (test mosh, tmux, SSH)

Open Questions

  • Should the status bar be above the input (like vim's statusline) or between output and input?
  • Should cost tracking be opt-in (some users might find it anxiety-inducing)?
  • Should we use tiktoken for real-time token estimation DURING streaming, or only report after completion?
  • Should token counts include tool call tokens (they can be substantial)?

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions