This repository was archived by the owner on May 26, 2026. It is now read-only.
feat(kora): KR-CHEAP-COST-TELEMETRY — per-route counters (R3-4 #10)#161
Merged
rafe-walker merged 1 commit intoMay 24, 2026
Merged
Conversation
Per Council R3 Lock R3-4 item #10. Data layer for all future tuning decisions — escalation-rate, classifier, route-shape. Without per-route counters we're flying blind on whether cheap-substrate work is saving what we expect. # New module: kora_cli/telemetry/ * ``cost_telemetry.py`` — ``CostRouteTelemetry`` singleton with threading.RLock-protected counters across 3 windows (process_lifetime / rolling_24h / monthly). Per-route counters: calls_count, input/output/cache_read/cache_creation_tokens_total, cost_estimate_usd_total, escalation_count, model_breakdown. * ``__init__.py`` — re-exports + canonical route + window literals. Route taxonomy (v1, spec §2): slack_dm, email_inbound, email_outbound_compose, mcp_tool, alert_investigation, probe_investigation, tool_loop_iteration, scheduled_task, unknown. Fail-soft: unknown route strings bucket to "unknown" rather than raising. ``record_call`` wraps the inner write in try/except so hot-path inference handlers never see a telemetry failure. # Wire route through record_inference ``CostStateHolder.record_inference`` gains optional kwargs: - ``route: str = "unknown"`` — canonical taxonomy literal - ``escalated_to_opus: bool = False`` — Lock R3-3 tunable signal Both are additive + backwards-compatible. After successful pricing estimation, telemetry's ``record_call`` is invoked alongside the existing billing accumulation. Telemetry write is READ-side (does NOT affect billing) and fail-soft on import / record errors. ``cost_ladder_wire.record_inference_from_response`` forwards both kwargs verbatim. No source= harmonization needed (existing record_inference had no source kwarg; the IncomingMessage.source field is at a different layer — the engine input). # Route wiring this PR (wired vs deferred) | Route | Call site | Wired? | |---|---|---| | slack_dm | slack_dm_handler.py:676 (Kora reply-bill) | ✅ wired | | email_inbound | (handler doesn't yet write a bill) | ⏸️ reserved | | email_outbound_compose | (no consumer) | ⏸️ reserved | | mcp_tool | (no consumer) | ⏸️ reserved | | alert_investigation | (consumer pending KR-PLUGIN-AUDIT) | ⏸️ reserved | | probe_investigation | (consumer pending probe-audit) | ⏸️ reserved | | tool_loop_iteration | (engine doesn't yet surface iteration tag) | ⏸️ reserved | | scheduled_task | (no consumer) | ⏸️ reserved | | (agent main loop) | conversation_loop.py:1603 | leaves default "unknown" — agent-side, not Kora-side; not in spec taxonomy | | (auxiliary client) | auxiliary_client.py:5341/5360 | leaves default "unknown" — agent-side | Every reserved route accepts the literal today; consumer wiring is a follow-on bucket per the spec §4 STOP-ASK guidance ("don't fail the bucket on missing call sites"). # Periodic-task wiring (cost_telemetry_listener) 3 tasks registered with the heartbeat scheduler: * ``cost_telemetry.persist`` — 5min cadence; atomic-writes counter snapshot to ``${KORA_HOME}/cache/cost_telemetry.json`` * ``cost_telemetry.rolling_24h_reset`` — 1h watch-and-act; fires reset on UTC date crossover * ``cost_telemetry.monthly_reset`` — 1h watch-and-act; fires reset on UTC month rollover First-tick stamping pattern: the reset checks stamp "today's date" on first call after boot without firing a reset (counters at zero anyway), so resets only fire on subsequent boundary crossings. Avoids the trickiness of exact-midnight asyncio scheduling. # Web endpoint ``GET /api/cost_telemetry`` returns the in-memory counter snapshot across all 3 windows (no disk roundtrip; no LLM cost). # Snapshot v2 (KR-CHEAP-PRE-WARMED-SNAPSHOT extension) Bumped ``SCHEMA_VERSION`` 1 → 2. ``compute_snapshot()`` adds a ``cost_telemetry`` section exposing the two operator-facing windows (``rolling_24h`` + ``monthly``); ``process_lifetime`` intentionally excluded from the on-disk snapshot to keep file size bounded — operator hits ``/api/cost_telemetry`` for the full window set. Section degrades to empty dicts on telemetry unavailability (fail-soft). # Concurrency threading.RLock on CostRouteTelemetry protects counter mutations + snapshot reads. Same shape as cron/jobs.py + suitable for asyncio loop + cron + reasoning all potentially racing. Test ``test_concurrent_record_call_no_lost_counts`` proves 1000 calls across 10 threads → exact total (no lost increments). # Read-only contract preserved This module is a READ-side observer of cost-ladder accounting. It does NOT change billing accumulation logic. Telemetry can be disabled (singleton swap) without affecting the cost-ladder's $200/mo budget enforcement. # Tests 141/141 pass: * 27 cost_telemetry (counter shape + route taxonomy + windows + concurrency + singleton + JSON-serializability + fail-soft) * 20 listener (registration + cadence resolution + persist cycle + window-reset checks + read-back + lifecycle log) * 9 web endpoint + snapshot v2 (endpoint shape + version bump + cost_telemetry section + degradation + end-to-end) * 51 snapshot tests pass (1 updated for schema v2 + new top- level key) * 16 existing cost_state_holder + cost_ladder_wire tests pass (backwards-compat preserved across new optional kwargs) Ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rafe-walker
added a commit
that referenced
this pull request
May 24, 2026
…or (#166) Closes the unified-operator-interface loop. Tails audit JSONL for probe.wake_requested events (PR #163 emits); per (probe, issue_category) inline debounce; invokes engine.respond() with structured probe context (issue + recent observations + envelope status); DMs operator via existing client.post_dm path. Activates route='probe_investigation' telemetry literal (PR #161 reserved). Engine reads message.source to derive route through existing record_inference site — no telemetry-side changes needed. Env vars added: KORA_PROBE_DEBOUNCE_SECONDS=600 (10 min default; 0 disables), KORA_PROBE_DEBOUNCE_BYPASS_CRITICAL=false (fail-closed; opt-in even for critical), KORA_PROBE_WAKE_POLL_SEC=30 (listener tail cadence). KORA_SLACK_JOSHUA_USER_ID reused from PR #149. All 4 STOP-ASK conditions resolved inline: - MessageSource Literal extended (1-line) with 'probe_investigation' + _derive_caller_session_id returns 'probe:{probe}:{category}' for future panel xref - Listener-coordinator wire uniform across 9 listeners (register_daemon_listener pattern) - Operator channel canonicalized at KORA_SLACK_JOSHUA_USER_ID (PR #149 precedent) - Tail-position stamping at first-tick (don't replay history at boot) — inverse of AlertNotifier's set-diff semantic; documented Wake-to-DM latency ~30s worst case (poll cadence), tunable to 5s. 42 new tests + 634/634 cross-bucket regression + ruff clean.
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Per Council R3 Lock R3-4 item #10. Eyes for the cost-economy discipline: every `record_inference` call gets a route tag; counters roll up per-route across 3 windows (process_lifetime, rolling_24h, monthly); cockpit + future tuning decisions read from this telemetry at $0 LLM cost.
Bucket spec: `17_cc_bucket_prompts/KR-CHEAP-COST-TELEMETRY_per_route_counters.md`
Routes wired vs deferred (per spec §4 STOP-ASK guidance)
Every reserved route accepts the literal today; consumer wiring is a follow-on bucket. Per spec §4 "don't fail the bucket on missing call sites."
No `source=` harmonization needed
Spec §4 STOP-ASK #1: `record_inference` had no `source` kwarg today. The `IncomingMessage.source` field is at a different layer (engine input). Adding `route` is purely additive — zero existing-param conflict.
Surface
Counter shape per route
```json
{
"calls_count": 0,
"input_tokens_total": 0,
"output_tokens_total": 0,
"cache_read_tokens_total": 0,
"cache_creation_tokens_total": 0,
"cost_estimate_usd_total": 0.0,
"escalation_count": 0,
"model_breakdown": {}
}
```
Stable shape from process boot — every known route pre-populated in every window so consumers don't branch on absence.
3 windows
`process_lifetime` excluded from on-disk snapshot file to bound file size; `/api/cost_telemetry` endpoint returns ALL THREE.
Snapshot v2
`SCHEMA_VERSION` bumped 1 → 2. `compute_snapshot()` adds a `cost_telemetry` section exposing the two operator-facing windows (`rolling_24h` + `monthly`). End-to-end test (`test_api_snapshot_endpoint_includes_cost_telemetry`) confirms `/api/snapshot` returns the v2 shape.
Concurrency
`threading.RLock` on `CostRouteTelemetry` protects counter mutations + snapshot reads. Spec §4 STOP-ASK #3 says "propose lock strategy if non-trivial" — this one IS trivial (single coarse RLock around all writes/reads). Test `test_concurrent_record_call_no_lost_counts` proves 1000 calls across 10 threads sum exactly.
Read-only contract
This module is a READ-side observer of cost-ladder accounting. It does NOT mutate billing logic, the cost-ladder ladder rung, or the $200/mo budget enforcement. Disabling telemetry (singleton swap or import failure) is fail-soft per-route at the holder.record_inference seam.
Test plan
Cascade
Recommended follow-on bucket dispatch for filling the deferred routes:
🤖 Generated with Claude Code