Feature request
Please expose the current prompt-context token usage through the API/SSE run lifecycle, so downstream clients can display an accurate context meter without estimating from visible transcript text or cumulative billing usage.
Motivation
Downstream UIs currently only have imperfect signals:
-
Cumulative billing usage (input_tokens + output_tokens) is not the same as current context usage.
- It keeps growing across turns.
- It includes output/reasoning-side usage.
- It does not shrink when a session is compacted.
-
Client-side transcript estimation is also wrong.
- It cannot see the full prompt payload: system prompt, memory, skills, tool schemas, hidden provider formatting, etc.
- It cannot reliably account for tool arguments/results or provider tokenizer differences.
- After automatic compaction, a UI may keep estimating from stale visible transcript data and show the session as over-limit even though Hermes has compacted it.
Hermes Agent already appears to maintain the more accurate internal value through the context compressor (last_prompt_tokens) and uses it for compression decisions/status displays. Exposing that value would let thin clients show honest context state without duplicating token accounting.
Proposed API shape
Add optional context fields to the run.completed event, preferably inside usage for backward compatibility:
{
"event": "run.completed",
"run_id": "...",
"output": "...",
"usage": {
"input_tokens": 12345,
"output_tokens": 678,
"total_tokens": 13023,
"context_tokens": 45678,
"context_length": 200000,
"compression_count": 1,
"context_source": "provider_prompt_tokens"
},
"session_id": "current-effective-session-id",
"previous_session_id": "optional-previous-session-id",
"compressed": true
}
Field notes:
context_tokens: current/effective prompt tokens loaded for the active session, preferably the provider-reported prompt token count used by Hermes' context compressor.
context_length: model context length Hermes resolved for this run.
compression_count: number of compactions in this run/session if available.
context_source: e.g. provider_prompt_tokens, rough_estimate, or unknown.
session_id: effective session id after any automatic compaction/session split.
previous_session_id / compressed: optional metadata so web clients can reload or switch to the continuation session immediately after compaction.
Acceptance criteria
run.completed exposes current prompt-context usage separately from cumulative billing usage.
- Values are optional/backward-compatible for providers that do not return usage.
- After automatic compaction/session split, API clients can discover the effective continuation session id and updated context usage without waiting for the next user turn.
- Documentation clarifies the difference between:
- billing/session usage (
input_tokens, output_tokens, cost accounting)
- current prompt context usage (
context_tokens / last_prompt_tokens)
Alternatives considered
Client-side text/token estimation
Rejected. It cannot see hidden prompt components such as system prompt, memory, skills, tool schemas, provider formatting, or exact tokenizer behavior. It also becomes stale after Hermes compacts a session.
Using cumulative input_tokens + output_tokens
Rejected. This is billing/session usage, not current context usage. It grows monotonically and does not represent remaining context after compaction.
Downstream use case
Hermes Web UI wants to display a compact context meter. It should consume Hermes Agent's reported prompt-context usage rather than estimating locally. This would prevent misleading “remaining context” displays and make the UI compaction-aware.
Feature request
Please expose the current prompt-context token usage through the API/SSE run lifecycle, so downstream clients can display an accurate context meter without estimating from visible transcript text or cumulative billing usage.
Motivation
Downstream UIs currently only have imperfect signals:
Cumulative billing usage (
input_tokens + output_tokens) is not the same as current context usage.Client-side transcript estimation is also wrong.
Hermes Agent already appears to maintain the more accurate internal value through the context compressor (
last_prompt_tokens) and uses it for compression decisions/status displays. Exposing that value would let thin clients show honest context state without duplicating token accounting.Proposed API shape
Add optional context fields to the
run.completedevent, preferably insideusagefor backward compatibility:{ "event": "run.completed", "run_id": "...", "output": "...", "usage": { "input_tokens": 12345, "output_tokens": 678, "total_tokens": 13023, "context_tokens": 45678, "context_length": 200000, "compression_count": 1, "context_source": "provider_prompt_tokens" }, "session_id": "current-effective-session-id", "previous_session_id": "optional-previous-session-id", "compressed": true }Field notes:
context_tokens: current/effective prompt tokens loaded for the active session, preferably the provider-reported prompt token count used by Hermes' context compressor.context_length: model context length Hermes resolved for this run.compression_count: number of compactions in this run/session if available.context_source: e.g.provider_prompt_tokens,rough_estimate, orunknown.session_id: effective session id after any automatic compaction/session split.previous_session_id/compressed: optional metadata so web clients can reload or switch to the continuation session immediately after compaction.Acceptance criteria
run.completedexposes current prompt-context usage separately from cumulative billing usage.input_tokens,output_tokens, cost accounting)context_tokens/last_prompt_tokens)Alternatives considered
Client-side text/token estimation
Rejected. It cannot see hidden prompt components such as system prompt, memory, skills, tool schemas, provider formatting, or exact tokenizer behavior. It also becomes stale after Hermes compacts a session.
Using cumulative
input_tokens + output_tokensRejected. This is billing/session usage, not current context usage. It grows monotonically and does not represent remaining context after compaction.
Downstream use case
Hermes Web UI wants to display a compact context meter. It should consume Hermes Agent's reported prompt-context usage rather than estimating locally. This would prevent misleading “remaining context” displays and make the UI compaction-aware.