Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

feat(kora): KR-CHEAP-COST-TELEMETRY — per-route counters (R3-4 #10)#161

Merged
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-CHEAP-COST-TELEMETRY
May 24, 2026
Merged

feat(kora): KR-CHEAP-COST-TELEMETRY — per-route counters (R3-4 #10)#161
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-CHEAP-COST-TELEMETRY

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

Summary

Per Council R3 Lock R3-4 item #10. Eyes for the cost-economy discipline: every `record_inference` call gets a route tag; counters roll up per-route across 3 windows (process_lifetime, rolling_24h, monthly); cockpit + future tuning decisions read from this telemetry at $0 LLM cost.

Bucket spec: `17_cc_bucket_prompts/KR-CHEAP-COST-TELEMETRY_per_route_counters.md`

Routes wired vs deferred (per spec §4 STOP-ASK guidance)

Route Call site Disposition
`slack_dm` `slack_dm_handler.py:676` (Kora reply-bill) wired this PR
`email_inbound` (handler doesn't yet write a bill; engine path doesn't go through holder.record_inference) ⏸️ literal reserved — wire when email handler gets a cost-ladder write
`email_outbound_compose` no consumer yet ⏸️ reserved
`mcp_tool` no consumer yet ⏸️ reserved
`alert_investigation` no consumer (Lock R3-8 (d) work) ⏸️ reserved
`probe_investigation` no consumer (Lock R3-8 (b) work) ⏸️ reserved
`tool_loop_iteration` engine doesn't yet surface iteration tag ⏸️ reserved
`scheduled_task` no consumer yet ⏸️ reserved
(agent main turn) `conversation_loop.py:1603` leaves default `unknown` — agent-side, not in Kora taxonomy
(auxiliary client side-tasks) `auxiliary_client.py:5341/5360` leaves default `unknown` — agent-side

Every reserved route accepts the literal today; consumer wiring is a follow-on bucket. Per spec §4 "don't fail the bucket on missing call sites."

No `source=` harmonization needed

Spec §4 STOP-ASK #1: `record_inference` had no `source` kwarg today. The `IncomingMessage.source` field is at a different layer (engine input). Adding `route` is purely additive — zero existing-param conflict.

Surface

Layer LOC
`kora_cli/telemetry/cost_telemetry.py` (NEW) 320 — `CostRouteTelemetry` + counters + windows + threading.RLock + singleton
`kora_cli/telemetry/init.py` (NEW) 45 — public re-exports
`kora_cli/listeners/cost_telemetry_listener.py` (NEW) 270 — persist task + window-reset tasks + path resolution + watch-and-act boundary checks
`kora_cli/listeners/init.py` +6 — wire-in last
`kora_cli/snapshot/state_snapshot.py` SCHEMA_VERSION 1→2 + `_collect_cost_telemetry` + section in compute_snapshot
`kora_cli/web_server.py` +28 — `GET /api/cost_telemetry` endpoint
`kora_cli/handlers/slack_dm_handler.py` +5 — tag `route="slack_dm"` on Kora reply-bill
`agent/cost_state_holder.py` +30 — accept `route` + `escalated_to_opus` kwargs; telemetry hook
`agent/cost_ladder_wire.py` +6 — forward kwargs
Tests 56 new (27 telemetry + 20 listener + 9 endpoint+snapshot-v2)

Counter shape per route

```json
{
"calls_count": 0,
"input_tokens_total": 0,
"output_tokens_total": 0,
"cache_read_tokens_total": 0,
"cache_creation_tokens_total": 0,
"cost_estimate_usd_total": 0.0,
"escalation_count": 0,
"model_breakdown": {}
}
```

Stable shape from process boot — every known route pre-populated in every window so consumers don't branch on absence.

3 windows

Window Reset trigger
`process_lifetime` process restart only
`rolling_24h` UTC midnight (watch-and-act periodic task)
`monthly` UTC month rollover (watch-and-act periodic task)

`process_lifetime` excluded from on-disk snapshot file to bound file size; `/api/cost_telemetry` endpoint returns ALL THREE.

Snapshot v2

`SCHEMA_VERSION` bumped 1 → 2. `compute_snapshot()` adds a `cost_telemetry` section exposing the two operator-facing windows (`rolling_24h` + `monthly`). End-to-end test (`test_api_snapshot_endpoint_includes_cost_telemetry`) confirms `/api/snapshot` returns the v2 shape.

Concurrency

`threading.RLock` on `CostRouteTelemetry` protects counter mutations + snapshot reads. Spec §4 STOP-ASK #3 says "propose lock strategy if non-trivial" — this one IS trivial (single coarse RLock around all writes/reads). Test `test_concurrent_record_call_no_lost_counts` proves 1000 calls across 10 threads sum exactly.

Read-only contract

This module is a READ-side observer of cost-ladder accounting. It does NOT mutate billing logic, the cost-ladder ladder rung, or the $200/mo budget enforcement. Disabling telemetry (singleton swap or import failure) is fail-soft per-route at the holder.record_inference seam.

Test plan

  • 56 new tests pass (27 telemetry + 20 listener + 9 endpoint/v2)
  • 141/141 focused regression (telemetry + listener + snapshot + endpoint + cost_state_holder + cost_ladder_wire)
  • Ruff clean
  • Backwards-compat: existing `record_inference` callers without `route=` keep working (default `"unknown"`)

Cascade

Recommended follow-on bucket dispatch for filling the deferred routes:

  1. KR-EMAIL-COST-BILL — wire `holder.record_inference` into `email_inbound_handler._send_auto_reply` symmetric to slack_dm; tag `route="email_inbound"` for the inbound reply path
  2. KR-MCP-TOOL-COST-TAG — tag MCP-driven reasoning paths with `route="mcp_tool"`
  3. KR-REASONING-ITERATION-TAG — engine surfaces iteration index; iteration 2+ tagged `route="tool_loop_iteration"`
  4. KR-PLUGIN-AUDIT-COST-TAG — alert/probe investigation reasoning paths tagged once the wakeup machinery lands

🤖 Generated with Claude Code

Per Council R3 Lock R3-4 item #10. Data layer for all future tuning
decisions — escalation-rate, classifier, route-shape. Without
per-route counters we're flying blind on whether cheap-substrate
work is saving what we expect.

# New module: kora_cli/telemetry/

  * ``cost_telemetry.py`` — ``CostRouteTelemetry`` singleton with
    threading.RLock-protected counters across 3 windows
    (process_lifetime / rolling_24h / monthly). Per-route counters:
    calls_count, input/output/cache_read/cache_creation_tokens_total,
    cost_estimate_usd_total, escalation_count, model_breakdown.
  * ``__init__.py`` — re-exports + canonical route + window literals.

Route taxonomy (v1, spec §2):
  slack_dm, email_inbound, email_outbound_compose, mcp_tool,
  alert_investigation, probe_investigation, tool_loop_iteration,
  scheduled_task, unknown.

Fail-soft: unknown route strings bucket to "unknown" rather than
raising. ``record_call`` wraps the inner write in try/except so
hot-path inference handlers never see a telemetry failure.

# Wire route through record_inference

``CostStateHolder.record_inference`` gains optional kwargs:
  - ``route: str = "unknown"`` — canonical taxonomy literal
  - ``escalated_to_opus: bool = False`` — Lock R3-3 tunable signal

Both are additive + backwards-compatible. After successful pricing
estimation, telemetry's ``record_call`` is invoked alongside the
existing billing accumulation. Telemetry write is READ-side (does
NOT affect billing) and fail-soft on import / record errors.

``cost_ladder_wire.record_inference_from_response`` forwards both
kwargs verbatim.

No source= harmonization needed (existing record_inference had no
source kwarg; the IncomingMessage.source field is at a different
layer — the engine input).

# Route wiring this PR (wired vs deferred)

| Route | Call site | Wired? |
|---|---|---|
| slack_dm | slack_dm_handler.py:676 (Kora reply-bill) | ✅ wired |
| email_inbound | (handler doesn't yet write a bill) | ⏸️ reserved |
| email_outbound_compose | (no consumer) | ⏸️ reserved |
| mcp_tool | (no consumer) | ⏸️ reserved |
| alert_investigation | (consumer pending KR-PLUGIN-AUDIT) | ⏸️ reserved |
| probe_investigation | (consumer pending probe-audit) | ⏸️ reserved |
| tool_loop_iteration | (engine doesn't yet surface iteration tag) | ⏸️ reserved |
| scheduled_task | (no consumer) | ⏸️ reserved |
| (agent main loop) | conversation_loop.py:1603 | leaves default "unknown" — agent-side, not Kora-side; not in spec taxonomy |
| (auxiliary client) | auxiliary_client.py:5341/5360 | leaves default "unknown" — agent-side |

Every reserved route accepts the literal today; consumer wiring is
a follow-on bucket per the spec §4 STOP-ASK guidance ("don't fail
the bucket on missing call sites").

# Periodic-task wiring (cost_telemetry_listener)

3 tasks registered with the heartbeat scheduler:
  * ``cost_telemetry.persist`` — 5min cadence; atomic-writes
    counter snapshot to ``${KORA_HOME}/cache/cost_telemetry.json``
  * ``cost_telemetry.rolling_24h_reset`` — 1h watch-and-act;
    fires reset on UTC date crossover
  * ``cost_telemetry.monthly_reset`` — 1h watch-and-act; fires
    reset on UTC month rollover

First-tick stamping pattern: the reset checks stamp "today's
date" on first call after boot without firing a reset (counters
at zero anyway), so resets only fire on subsequent boundary
crossings. Avoids the trickiness of exact-midnight asyncio
scheduling.

# Web endpoint

``GET /api/cost_telemetry`` returns the in-memory counter
snapshot across all 3 windows (no disk roundtrip; no LLM cost).

# Snapshot v2 (KR-CHEAP-PRE-WARMED-SNAPSHOT extension)

Bumped ``SCHEMA_VERSION`` 1 → 2. ``compute_snapshot()`` adds a
``cost_telemetry`` section exposing the two operator-facing
windows (``rolling_24h`` + ``monthly``); ``process_lifetime``
intentionally excluded from the on-disk snapshot to keep file
size bounded — operator hits ``/api/cost_telemetry`` for the
full window set. Section degrades to empty dicts on telemetry
unavailability (fail-soft).

# Concurrency

threading.RLock on CostRouteTelemetry protects counter mutations
+ snapshot reads. Same shape as cron/jobs.py + suitable for
asyncio loop + cron + reasoning all potentially racing. Test
``test_concurrent_record_call_no_lost_counts`` proves 1000 calls
across 10 threads → exact total (no lost increments).

# Read-only contract preserved

This module is a READ-side observer of cost-ladder accounting. It
does NOT change billing accumulation logic. Telemetry can be
disabled (singleton swap) without affecting the cost-ladder's
$200/mo budget enforcement.

# Tests

141/141 pass:
  * 27 cost_telemetry (counter shape + route taxonomy + windows +
    concurrency + singleton + JSON-serializability + fail-soft)
  * 20 listener (registration + cadence resolution + persist
    cycle + window-reset checks + read-back + lifecycle log)
  * 9 web endpoint + snapshot v2 (endpoint shape + version bump
    + cost_telemetry section + degradation + end-to-end)
  * 51 snapshot tests pass (1 updated for schema v2 + new top-
    level key)
  * 16 existing cost_state_holder + cost_ladder_wire tests pass
    (backwards-compat preserved across new optional kwargs)

Ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rafe-walker rafe-walker merged commit 93f0548 into feature/phase2-upgrades May 24, 2026
@rafe-walker rafe-walker deleted the feat/kora-KR-CHEAP-COST-TELEMETRY branch May 24, 2026 01:55
rafe-walker added a commit that referenced this pull request May 24, 2026
…or (#166)

Closes the unified-operator-interface loop. Tails audit JSONL for probe.wake_requested events (PR #163 emits); per (probe, issue_category) inline debounce; invokes engine.respond() with structured probe context (issue + recent observations + envelope status); DMs operator via existing client.post_dm path.

Activates route='probe_investigation' telemetry literal (PR #161 reserved). Engine reads message.source to derive route through existing record_inference site — no telemetry-side changes needed.

Env vars added: KORA_PROBE_DEBOUNCE_SECONDS=600 (10 min default; 0 disables), KORA_PROBE_DEBOUNCE_BYPASS_CRITICAL=false (fail-closed; opt-in even for critical), KORA_PROBE_WAKE_POLL_SEC=30 (listener tail cadence). KORA_SLACK_JOSHUA_USER_ID reused from PR #149.

All 4 STOP-ASK conditions resolved inline:
- MessageSource Literal extended (1-line) with 'probe_investigation' + _derive_caller_session_id returns 'probe:{probe}:{category}' for future panel xref
- Listener-coordinator wire uniform across 9 listeners (register_daemon_listener pattern)
- Operator channel canonicalized at KORA_SLACK_JOSHUA_USER_ID (PR #149 precedent)
- Tail-position stamping at first-tick (don't replay history at boot) — inverse of AlertNotifier's set-diff semantic; documented

Wake-to-DM latency ~30s worst case (poll cadence), tunable to 5s. 42 new tests + 634/634 cross-bucket regression + ruff clean.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant