feat(providers): parse llama.cpp timings object for cache + performance metrics

## Summary

The OpenAI-compatible provider (`OpenAiCompatibleChatClient`) only parses the top-level `usage` object from llama.cpp responses. llama.cpp returns a sibling `timings` object with rich performance and cache data that we completely ignore.

## What llama.cpp actually returns

```json
{
  "usage": {
    "prompt_tokens": 11,
    "completion_tokens": 5,
    "total_tokens": 16
  },
  "timings": {
    "cache_n": 0,
    "prompt_n": 11,
    "prompt_ms": 139.655,
    "prompt_per_token_ms": 12.696,
    "prompt_per_second": 78.766,
    "predicted_n": 5,
    "predicted_ms": 160.048,
    "predicted_per_token_ms": 32.010,
    "predicted_per_second": 31.241
  }
}
```

### Key fields we're missing

| Field | What it tells us |
|-------|-----------------|
| `cache_n` | **Cached prompt tokens** — the KV cache hit count. This is the metric for validating session-sticky routing (#610) |
| `prompt_n` | Non-cached prompt tokens processed |
| `prompt_ms` | Prompt processing time (effective TTFT at the server) |
| `prompt_per_second` | Prefill throughput (tok/s) |
| `predicted_n` | Output tokens generated |
| `predicted_ms` | Generation time |
| `predicted_per_second` | Output generation throughput (tok/s) — clean metric, no reasoning token confusion |

## What changes

### 1. Parse `timings` in `OpenAiCompatibleChatClient.ParseUsage()`

Extend `ParseUsage()` to read the `timings` object when present:
- Map `cache_n` → `UsageDetails.CachedInputTokenCount`
- Store timing fields in `UsageDetails.AdditionalCounts` (or a new extension property) so they flow through the existing pipeline

### 2. Surface in `UsageOutput`

The `UsageOutput` record already has `CachedInputTokens` — it just never gets populated for the OpenAI-compatible provider. Populating `CachedInputTokenCount` in `UsageDetails` will automatically flow through `LlmSessionActor.EmitUsageOutput()`.

For the timing fields (`prompt_ms`, `predicted_per_second`, etc.), decide whether to:
- Add dedicated properties to `UsageOutput` (clean, typed)
- Use `AdditionalCounts` on `UsageDetails` (extensible, no protocol change)

### 3. Add timing metrics to headless `--json` envelope

The `chat -p --json` output (#611) currently includes `usage.inputTokens/outputTokens/totalTokens`. Extend to include:
- `cachedInputTokens` — cache hit count
- `promptMs` — server-side prefill time
- `predictedPerSecond` — output tok/s
- `ttftMs` — client-side time to first text delta (measured in `HeadlessChannel`)
- `totalMs` — client-side prompt-to-turn-completed wall time

### 4. Graceful degradation

The `timings` object is llama.cpp-specific — other OpenAI-compatible servers (vLLM, TGI, Ollama) may not include it. Parsing must be optional: if `timings` is absent, all derived fields stay null.

## Motivation

- **Validate session-sticky routing (#610):** `cache_n` on turn 2+ should be > 0 when the same GPU handles consecutive turns for a session.
- **KV cache benchmarking (#611):** Multi-turn evals can now compare `cache_n` and `prompt_ms` across turns.
- **Performance regression detection:** `predicted_per_second` gives clean output throughput without reasoning token noise.
- **TTFT tracking (#608):** Server-side `prompt_ms` + client-side TTFT gives end-to-end latency breakdown.

## Acceptance criteria

- [ ] `ParseUsage()` reads `timings.cache_n` → `CachedInputTokenCount` when present
- [ ] `ParseUsage()` reads timing fields into `AdditionalCounts` (or dedicated properties)
- [ ] `chat -p --json` output includes `cachedInputTokens` and timing metrics when available
- [ ] Existing behavior unchanged when `timings` is absent (graceful degradation)
- [ ] Unit tests for `ParseUsage` with and without `timings` object
- [ ] `quick-multi-turn-test.sh` updated to assert `cachedInputTokens` on turn 2 (when using llama.cpp)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(providers): parse llama.cpp timings object for cache + performance metrics #614

Summary

What llama.cpp actually returns

Key fields we're missing

What changes

1. Parse `timings` in `OpenAiCompatibleChatClient.ParseUsage()`

2. Surface in `UsageOutput`

3. Add timing metrics to headless `--json` envelope

4. Graceful degradation

Motivation

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Field	What it tells us
`cache_n`	Cached prompt tokens — the KV cache hit count. This is the metric for validating session-sticky routing (#610)
`prompt_n`	Non-cached prompt tokens processed
`prompt_ms`	Prompt processing time (effective TTFT at the server)
`prompt_per_second`	Prefill throughput (tok/s)
`predicted_n`	Output tokens generated
`predicted_ms`	Generation time
`predicted_per_second`	Output generation throughput (tok/s) — clean metric, no reasoning token confusion

feat(providers): parse llama.cpp timings object for cache + performance metrics #614

Description

Summary

What llama.cpp actually returns

Key fields we're missing

What changes

1. Parse timings in OpenAiCompatibleChatClient.ParseUsage()

2. Surface in UsageOutput

3. Add timing metrics to headless --json envelope

4. Graceful degradation

Motivation

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. Parse `timings` in `OpenAiCompatibleChatClient.ParseUsage()`

2. Surface in `UsageOutput`

3. Add timing metrics to headless `--json` envelope