Overview
Add a persistent status bar to the Hermes CLI that displays the current model, token usage, context window fullness, session duration, and estimated cost. This is the single highest-impact, lowest-effort CLI improvement available.
Every modern AI coding CLI (Aider, Claude Code, Toad) shows token/cost information. Hermes shows nothing — users have no visibility into how full their context window is, what model they're using, or how much a session costs until they hit a wall (context overflow, unexpected bill).
Split from #504 for atomicity. The companion feature (rich rendering + navigation) is tracked in #684.
Research Findings
What Competitors Show
Aider: Real-time token/cost display per interaction. Shows tokens used per file in context.
Claude Code: Token counter in the prompt bar. Context window percentage visible at all times. /usage command for detailed breakdown.
Toad CLI: Model indicator, streaming status, connection health in the status line.
The Pattern
A thin status bar (1-2 lines) that updates after each interaction. Always visible. Color-coded thresholds for context fullness.
Current State in Hermes Agent
No status bar exists. The current prompt_toolkit layout (cli.py ~line 3289) is:
HSplit([
Window(height=0), # spacer
sudo_widget, # conditional sudo password
approval_widget, # conditional approval
clarify_widget, # conditional clarify
spacer, # flexible space
input_rule_top, # bronze ─ rule
image_bar, # clipboard image badges
input_area, # TextArea
input_rule_bot, # bronze ─ rule
CompletionsMenu, # autocomplete
])
Token data IS available — the OpenAI API response includes a usage field with prompt_tokens, completion_tokens, and total_tokens. This data just isn't surfaced to the user.
Model info IS available — stored in session state, displayed in the banner at startup, but not persistently visible.
Implementation Plan
Skill vs. Tool Classification
This is a core codebase change to cli.py. It requires modifications to the prompt_toolkit layout and a data pipeline from API responses to the status bar display.
Specification
Status bar layout (1 line, above the input area):
⚕ claude-sonnet-4-20250514 │ Tokens: 12,450 / 200K │ [██████░░░░] 62% │ Cost: $0.23 │ 14m
Components:
- Model name: current model (truncated if needed)
- Token count: prompt + completion tokens / max context
- Context bar: visual [██████░░░░] with color coding:
- Green: < 50%
- Yellow: 50-80%
- Red: > 80%
- Blinking red: > 95% (imminent overflow)
- Cost estimate: cumulative session cost based on per-token pricing
- Duration: session elapsed time
Data pipeline:
- After each API call, extract
usage from response
- Update a shared state dict (
_status_state)
- Trigger
app.invalidate() to repaint the status bar
- Cost calculation:
prompt_tokens * input_price + completion_tokens * output_price
Model pricing (configurable in config.yaml):
pricing:
claude-sonnet-4-20250514:
input: 3.0 # per million tokens
output: 15.0
gpt-4o:
input: 2.5
output: 10.0
# fallback for unknown models
default:
input: 1.0
output: 3.0
/usage command: Detailed breakdown:
Session Usage Report
━━━━━━━━━━━━━━━━━━━
Model: claude-sonnet-4-20250514
Duration: 14m 32s
Turns: 7
Tokens:
Prompt: 10,230 (input)
Completion: 2,220 (output)
Total: 12,450
Context: 62% of 200,000
Cost:
Input: $0.031
Output: $0.033
Total: $0.064
Deliverables
Pros & Cons
Pros
- Highest impact-to-effort ratio of any CLI improvement
- Prevents surprise context overflow (users see it coming)
- Cost visibility saves money (users learn which prompts are expensive)
- Small, self-contained change (one new Window in the HSplit)
- No refactoring required — purely additive
- Can ship in 1-2 days
Cons / Risks
- Token counts from server are delayed (reported after completion, not during streaming)
- Cost estimation requires maintaining a pricing table (could go stale)
- Status bar takes 1 line of vertical space (mitigated by compact mode)
- Some terminals may render the color bar poorly (test mosh, tmux, SSH)
Open Questions
- Should the status bar be above the input (like vim's statusline) or between output and input?
- Should cost tracking be opt-in (some users might find it anxiety-inducing)?
- Should we use tiktoken for real-time token estimation DURING streaming, or only report after completion?
- Should token counts include tool call tokens (they can be substantial)?
References
Overview
Add a persistent status bar to the Hermes CLI that displays the current model, token usage, context window fullness, session duration, and estimated cost. This is the single highest-impact, lowest-effort CLI improvement available.
Every modern AI coding CLI (Aider, Claude Code, Toad) shows token/cost information. Hermes shows nothing — users have no visibility into how full their context window is, what model they're using, or how much a session costs until they hit a wall (context overflow, unexpected bill).
Split from #504 for atomicity. The companion feature (rich rendering + navigation) is tracked in #684.
Research Findings
What Competitors Show
Aider: Real-time token/cost display per interaction. Shows tokens used per file in context.
Claude Code: Token counter in the prompt bar. Context window percentage visible at all times.
/usagecommand for detailed breakdown.Toad CLI: Model indicator, streaming status, connection health in the status line.
The Pattern
A thin status bar (1-2 lines) that updates after each interaction. Always visible. Color-coded thresholds for context fullness.
Current State in Hermes Agent
No status bar exists. The current prompt_toolkit layout (cli.py ~line 3289) is:
Token data IS available — the OpenAI API response includes a
usagefield withprompt_tokens,completion_tokens, andtotal_tokens. This data just isn't surfaced to the user.Model info IS available — stored in session state, displayed in the banner at startup, but not persistently visible.
Implementation Plan
Skill vs. Tool Classification
This is a core codebase change to
cli.py. It requires modifications to the prompt_toolkit layout and a data pipeline from API responses to the status bar display.Specification
Status bar layout (1 line, above the input area):
Components:
Data pipeline:
usagefrom response_status_state)app.invalidate()to repaint the status barprompt_tokens * input_price + completion_tokens * output_priceModel pricing (configurable in config.yaml):
/usagecommand: Detailed breakdown:Deliverables
Windowin prompt_toolkit layout (above input_rule_top)FormattedTextControlwith dynamic content from_status_stateusagefield/usagecommand with detailed breakdownPros & Cons
Pros
Cons / Risks
Open Questions
References