Skip to content

Expose real prompt context usage and compaction metadata in API run events #15618

@hanzckernel

Description

@hanzckernel

Feature request

Please expose the current prompt-context token usage through the API/SSE run lifecycle, so downstream clients can display an accurate context meter without estimating from visible transcript text or cumulative billing usage.

Motivation

Downstream UIs currently only have imperfect signals:

  1. Cumulative billing usage (input_tokens + output_tokens) is not the same as current context usage.

    • It keeps growing across turns.
    • It includes output/reasoning-side usage.
    • It does not shrink when a session is compacted.
  2. Client-side transcript estimation is also wrong.

    • It cannot see the full prompt payload: system prompt, memory, skills, tool schemas, hidden provider formatting, etc.
    • It cannot reliably account for tool arguments/results or provider tokenizer differences.
    • After automatic compaction, a UI may keep estimating from stale visible transcript data and show the session as over-limit even though Hermes has compacted it.

Hermes Agent already appears to maintain the more accurate internal value through the context compressor (last_prompt_tokens) and uses it for compression decisions/status displays. Exposing that value would let thin clients show honest context state without duplicating token accounting.

Proposed API shape

Add optional context fields to the run.completed event, preferably inside usage for backward compatibility:

{
  "event": "run.completed",
  "run_id": "...",
  "output": "...",
  "usage": {
    "input_tokens": 12345,
    "output_tokens": 678,
    "total_tokens": 13023,

    "context_tokens": 45678,
    "context_length": 200000,
    "compression_count": 1,
    "context_source": "provider_prompt_tokens"
  },
  "session_id": "current-effective-session-id",
  "previous_session_id": "optional-previous-session-id",
  "compressed": true
}

Field notes:

  • context_tokens: current/effective prompt tokens loaded for the active session, preferably the provider-reported prompt token count used by Hermes' context compressor.
  • context_length: model context length Hermes resolved for this run.
  • compression_count: number of compactions in this run/session if available.
  • context_source: e.g. provider_prompt_tokens, rough_estimate, or unknown.
  • session_id: effective session id after any automatic compaction/session split.
  • previous_session_id / compressed: optional metadata so web clients can reload or switch to the continuation session immediately after compaction.

Acceptance criteria

  • run.completed exposes current prompt-context usage separately from cumulative billing usage.
  • Values are optional/backward-compatible for providers that do not return usage.
  • After automatic compaction/session split, API clients can discover the effective continuation session id and updated context usage without waiting for the next user turn.
  • Documentation clarifies the difference between:
    • billing/session usage (input_tokens, output_tokens, cost accounting)
    • current prompt context usage (context_tokens / last_prompt_tokens)

Alternatives considered

Client-side text/token estimation

Rejected. It cannot see hidden prompt components such as system prompt, memory, skills, tool schemas, provider formatting, or exact tokenizer behavior. It also becomes stale after Hermes compacts a session.

Using cumulative input_tokens + output_tokens

Rejected. This is billing/session usage, not current context usage. It grows monotonically and does not represent remaining context after compaction.

Downstream use case

Hermes Web UI wants to display a compact context meter. It should consume Hermes Agent's reported prompt-context usage rather than estimating locally. This would prevent misleading “remaining context” displays and make the UI compaction-aware.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/agentCore agent loop, run_agent.py, prompt buildercomp/gatewayGateway runner, session dispatch, deliverytype/featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions