Expose real prompt context usage and compaction metadata in API run events

## Feature request

Please expose the current prompt-context token usage through the API/SSE run lifecycle, so downstream clients can display an accurate context meter without estimating from visible transcript text or cumulative billing usage.

## Motivation

Downstream UIs currently only have imperfect signals:

1. Cumulative billing usage (`input_tokens + output_tokens`) is not the same as current context usage.
   - It keeps growing across turns.
   - It includes output/reasoning-side usage.
   - It does not shrink when a session is compacted.

2. Client-side transcript estimation is also wrong.
   - It cannot see the full prompt payload: system prompt, memory, skills, tool schemas, hidden provider formatting, etc.
   - It cannot reliably account for tool arguments/results or provider tokenizer differences.
   - After automatic compaction, a UI may keep estimating from stale visible transcript data and show the session as over-limit even though Hermes has compacted it.

Hermes Agent already appears to maintain the more accurate internal value through the context compressor (`last_prompt_tokens`) and uses it for compression decisions/status displays. Exposing that value would let thin clients show honest context state without duplicating token accounting.

## Proposed API shape

Add optional context fields to the `run.completed` event, preferably inside `usage` for backward compatibility:

```json
{
  "event": "run.completed",
  "run_id": "...",
  "output": "...",
  "usage": {
    "input_tokens": 12345,
    "output_tokens": 678,
    "total_tokens": 13023,

    "context_tokens": 45678,
    "context_length": 200000,
    "compression_count": 1,
    "context_source": "provider_prompt_tokens"
  },
  "session_id": "current-effective-session-id",
  "previous_session_id": "optional-previous-session-id",
  "compressed": true
}
```

Field notes:

- `context_tokens`: current/effective prompt tokens loaded for the active session, preferably the provider-reported prompt token count used by Hermes' context compressor.
- `context_length`: model context length Hermes resolved for this run.
- `compression_count`: number of compactions in this run/session if available.
- `context_source`: e.g. `provider_prompt_tokens`, `rough_estimate`, or `unknown`.
- `session_id`: effective session id after any automatic compaction/session split.
- `previous_session_id` / `compressed`: optional metadata so web clients can reload or switch to the continuation session immediately after compaction.

## Acceptance criteria

- `run.completed` exposes current prompt-context usage separately from cumulative billing usage.
- Values are optional/backward-compatible for providers that do not return usage.
- After automatic compaction/session split, API clients can discover the effective continuation session id and updated context usage without waiting for the next user turn.
- Documentation clarifies the difference between:
  - billing/session usage (`input_tokens`, `output_tokens`, cost accounting)
  - current prompt context usage (`context_tokens` / `last_prompt_tokens`)

## Alternatives considered

### Client-side text/token estimation

Rejected. It cannot see hidden prompt components such as system prompt, memory, skills, tool schemas, provider formatting, or exact tokenizer behavior. It also becomes stale after Hermes compacts a session.

### Using cumulative `input_tokens + output_tokens`

Rejected. This is billing/session usage, not current context usage. It grows monotonically and does not represent remaining context after compaction.

## Downstream use case

Hermes Web UI wants to display a compact context meter. It should consume Hermes Agent's reported prompt-context usage rather than estimating locally. This would prevent misleading “remaining context” displays and make the UI compaction-aware.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose real prompt context usage and compaction metadata in API run events #15618

Feature request

Motivation

Proposed API shape

Acceptance criteria

Alternatives considered

Client-side text/token estimation

Using cumulative `input_tokens + output_tokens`

Downstream use case

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Expose real prompt context usage and compaction metadata in API run events #15618

Description

Feature request

Motivation

Proposed API shape

Acceptance criteria

Alternatives considered

Client-side text/token estimation

Using cumulative input_tokens + output_tokens

Downstream use case

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Using cumulative `input_tokens + output_tokens`