Skip to content

Feature: Session Usage Visibility — Persistent Token Totals and Context Window Percentage in the CLI #1091

@kshitijk4poor

Description

@kshitijk4poor

Overview

Add persistent session-usage visibility to the Hermes CLI so users can always see how many tokens have been used in the current session and how full the active model context window is.

This is one of the biggest UX gaps in the current CLI. Hermes already has access to token usage data from API responses and already knows model context limits via model_metadata.py, but that information is not surfaced in a way that helps users manage long-running sessions.

OpenCode and similar coding CLIs do a good job of showing cumulative usage, while Codex-style interfaces make context-window fullness legible at a glance. Hermes should expose both.


Problem

Today, Hermes users cannot easily answer basic session-management questions while working:

  • How many tokens has this session used so far?
  • How close am I to filling the current context window?
  • Is a long conversation likely to compact soon or overflow unexpectedly?

That leads to avoidable surprises:

  • context pressure appears "suddenly"
  • users cannot tell whether a task is getting expensive
  • long sessions feel opaque compared with modern coding CLIs

There is already a broader open issue around a full CLI status bar and token/cost tracking (#683), but this narrower issue is specifically about surfacing session token totals plus context-window percentage in a simple, always-visible UX.


Proposed Design

Core behavior

Expose two pieces of session state in the CLI:

  • cumulative tokens used in the current session
  • current context-window utilization as a percentage of the active model's max context

Suggested display shape:

  • prompt/status line widget above the input area, or
  • another always-visible compact status element in the CLI layout

Example:

claude-sonnet │ 18.4k tokens used │ 41% context

Data sources

  • token usage from model/API response usage fields
  • max context from agent/model_metadata.py
  • session accumulator stored in CLI/session state

UX notes

  • keep it lightweight and always visible
  • show raw totals and percentage, not just a bar
  • degrade gracefully in narrow terminals
  • avoid requiring a separate slash command for the primary signal

Possible extension points


Initial Scope

MVP:

  • accumulate token usage across a session
  • resolve current model max context
  • compute context usage percentage
  • render both values in the CLI continuously after each turn
  • add tests for accounting and formatting

Possible follow-up work:

  • live/estimated token updates during streaming
  • per-turn token history
  • pricing/cost estimation
  • warnings at configurable thresholds

Open Questions

  • Should the displayed token total be cumulative session usage, current prompt-context size, or both?
  • Should tool-call tokens and internal reasoning tokens be included whenever providers expose them?
  • Should the feature be on by default, or configurable for minimal-mode users?
  • Should this land as a small slice of Feature: CLI Status Bar & Token/Cost Tracking — Persistent Context Window Visibility #683, or remain separately scoped so it can ship independently?

References

  • OpenCode-style cumulative usage visibility
  • Codex-style context percentage visibility
  • Existing Hermes code paths already expose the needed ingredients:
    • API usage fields
    • agent/model_metadata.py
    • CLI prompt_toolkit layout in cli.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions