Feature: Session Usage Visibility — Persistent Token Totals and Context Window Percentage in the CLI

## Overview

Add persistent session-usage visibility to the Hermes CLI so users can always see how many tokens have been used in the current session and how full the active model context window is.

This is one of the biggest UX gaps in the current CLI. Hermes already has access to token usage data from API responses and already knows model context limits via `model_metadata.py`, but that information is not surfaced in a way that helps users manage long-running sessions.

OpenCode and similar coding CLIs do a good job of showing cumulative usage, while Codex-style interfaces make context-window fullness legible at a glance. Hermes should expose both.

---

## Problem

Today, Hermes users cannot easily answer basic session-management questions while working:

- How many tokens has this session used so far?
- How close am I to filling the current context window?
- Is a long conversation likely to compact soon or overflow unexpectedly?

That leads to avoidable surprises:

- context pressure appears "suddenly"
- users cannot tell whether a task is getting expensive
- long sessions feel opaque compared with modern coding CLIs

There is already a broader open issue around a full CLI status bar and token/cost tracking (#683), but this narrower issue is specifically about surfacing session token totals plus context-window percentage in a simple, always-visible UX.

---

## Proposed Design

### Core behavior

Expose two pieces of session state in the CLI:

- cumulative tokens used in the current session
- current context-window utilization as a percentage of the active model's max context

Suggested display shape:

- prompt/status line widget above the input area, or
- another always-visible compact status element in the CLI layout

Example:

`claude-sonnet │ 18.4k tokens used │ 41% context`

### Data sources

- token usage from model/API response `usage` fields
- max context from `agent/model_metadata.py`
- session accumulator stored in CLI/session state

### UX notes

- keep it lightweight and always visible
- show raw totals and percentage, not just a bar
- degrade gracefully in narrow terminals
- avoid requiring a separate slash command for the primary signal

### Possible extension points

- optional detailed `/usage` readout
- color thresholds for context pressure
- split prompt vs completion token breakdown
- optional cost display later or under the broader #683 issue

---

## Initial Scope

MVP:

- accumulate token usage across a session
- resolve current model max context
- compute context usage percentage
- render both values in the CLI continuously after each turn
- add tests for accounting and formatting

Possible follow-up work:

- live/estimated token updates during streaming
- per-turn token history
- pricing/cost estimation
- warnings at configurable thresholds

---

## Open Questions

- Should the displayed token total be cumulative session usage, current prompt-context size, or both?
- Should tool-call tokens and internal reasoning tokens be included whenever providers expose them?
- Should the feature be on by default, or configurable for minimal-mode users?
- Should this land as a small slice of #683, or remain separately scoped so it can ship independently?

---

## References

- OpenCode-style cumulative usage visibility
- Codex-style context percentage visibility
- Existing Hermes code paths already expose the needed ingredients:
  - API `usage` fields
  - `agent/model_metadata.py`
  - CLI prompt_toolkit layout in `cli.py`



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Session Usage Visibility — Persistent Token Totals and Context Window Percentage in the CLI #1091

Overview

Problem

Proposed Design

Core behavior

Data sources

UX notes

Possible extension points

Initial Scope

Open Questions

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature: Session Usage Visibility — Persistent Token Totals and Context Window Percentage in the CLI #1091

Description

Overview

Problem

Proposed Design

Core behavior

Data sources

UX notes

Possible extension points

Initial Scope

Open Questions

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions