Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

feat(kora): KR-REASONING-PANEL — Kora reasoning activity lens (stub)#132

Merged
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-REASONING-PANEL
May 23, 2026
Merged

feat(kora): KR-REASONING-PANEL — Kora reasoning activity lens (stub)#132
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-REASONING-PANEL

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

Summary

Operator-facing view of Kora's recent ReasoningEngine calls — model used, tokens spent, cost-ladder rung at call time, errors. Pairs with CC#3's KR-FEAT-AI-RESPONSE-LOOP ST2 (in flight). Once ST2 extends slack_dm_log.jsonl outbound entries with the reasoning fields, a small follow-on bucket flips this stub to read from there.

Shape is pinned by the new test suite so the FE shipping in this PR keeps rendering during the cut-over.

What's in here

  • Backend stub: GET /api/reasoning/recent returning 4 representative calls deliberately spanning ok @ normal on opus + ok @ warn_75 on sonnet (cost-downshift in action) + halted @ hard_stop_100 (budget-locked refusal) + sdk_timeout failure — surfaces the full cost-ladder behaviour + error taxonomy at first look.
  • ReasoningPanel.tsx (new): title + stub banner + 4-column stats strip (total calls / token spend input → output / model distribution mini-bar / status distribution) + filter pills (all / ok / failed / halted) + newest-first timeline. Per row: status icon, model badge (tier-tinted — opus blue / sonnet purple / haiku gray / null=halted red), cost-rung pill color-mapped per spec (normal gray, warn_75 yellow, downshift_90 orange, hard_stop_100 red), duration with visual bar (warning ≥ 2s), input → output token counts, status badge (non-default only — happy-path rows stay clean), error_code chip, response excerpt (truncated to 80 chars collapsed; expandable to the 200-char API cap).
  • Dashboard card feat(KR-6): Python actor_has_capability helper — replaces assert_kora_can_perform stub, closes D-kr3-st1 #12 (Brain icon). Layout keeps lg:grid-cols-3 — row 2 grows to 3+3+3+3, the clean 4-row × 3-col closure of the layout. Headline goes destructive when halted > 0 in 24h (Kora was budget-locked — investigate cost rung); warning when failed > 0; foreground otherwise.
  • Route /reasoning + nav entry between /agent-activity and /slack-dm.

4-layer security contract

Extending the established pattern with reasoning-specific guards:

  1. response_text_truncated_200 rendered as PLAIN TEXT via React's default child escaping. Real responses may contain anything Kora generates (HTML / markdown / script fragments). FE pins via dangerouslySetInnerHTML grep.
  2. NO Anthropic key shapes (sk-ant- prefix + base64-like body) anywhere in payload. Walk-payload regex catches a future error-projection bug or log-entry edit that leaks credential material into the operator's view.
  3. NO PII (email regex / Slack user-ID regex) leaked from the inbound user's message into response_text_truncated_200 — Kora's generated text is operator-visible; user message content lives in SLACK-DM-PANEL, not here. Belt+braces walk-payload sweep covers any field, not just response_text.
  4. TS interface declares all fields with documented contracts; no raw_prompt / auth_token / response_html companion fields exist on ReasoningCall.

K-DG drift caught + handled

Per memory feedback_k_dg_substrate_field_names_in_specs:

  • Bucket §3(a) example payload used uppercase Enum NAMES (NORMAL / WARN_75 / DOWNSHIFT_90 / HARD_STOP_100) — same convention CC#3 cited in PR feat(kora): KR-FEAT-AI-RESPONSE-LOOP ST1 — ReasoningEngine + Anthropic impl + context loader #126's K-DG drift note.
  • BUT the canonical wire format per agent/cost_state_holder.py:114-117 is the lowercase Enum VALUES (NORMAL = "normal" etc), and engine.py:47-49's CostLadderRungName literal enforces lowercase.
  • Real CC#3 data will emit lowercase via .value. Stub uses lowercase to match — so stub and real data agree at flip time, no FE pill-color map breakage at the cutover. Backend test pins lowercase and flags this as the intentional spec divergence.

Test plan

  • tests/kora_cli/test_web_server_reasoning.py16 tests: shape, 4-call stub pin, status+rung+error diversity, per-entry schema with full ReasoningEngine error taxonomy validation, cost-rung lowercase pin (K-DG catch), all 4 security guards (Anthropic key walk-payload + PII walk-payload both per-field and whole-blob), FE source-pins (no dangerouslySetInnerHTML, response_text as JSX child), 200-char API-edge cap, by_status_24h reconciliation, tokens_total shape, cron-regression sanity.
  • Full admin-panel regression: 287/287 across 24 suites (with --extra dev only — slowapi now runtime per PR chore(kora): KR-SLOWAPI-DEP-FIX — move slowapi to runtime deps #128).
  • pnpm tsc -b clean.
  • pnpm build clean.
  • Manual smoke: load /reasoning, exercise filters, expand rows, verify destructive card tone when halted in stub.

Refs

🤖 Generated with Claude Code

Operator-facing view of Kora's recent ReasoningEngine calls. Pairs
with CC#3's KR-FEAT-AI-RESPONSE-LOOP ST2 (in flight) — once ST2
extends slack_dm_log.jsonl outbound entries with the model_used /
input_tokens / output_tokens / reasoning_duration_ms /
reasoning_error fields, a small follow-on bucket flips this stub
to read from there. Shape is pinned by the new test suite.

Single-PR scope:
  * GET /api/reasoning/recent stub — 4 representative calls
    deliberately spanning ok @ normal on opus, ok @ warn_75 on
    sonnet (cost-downshift), halted at hard_stop_100 (budget-
    locked refusal), and sdk_timeout failure so the operator's
    first look surfaces the full cost-ladder behaviour + error
    taxonomy. stub:true keeps the FE banner visible.
  * ReasoningPanel.tsx — title + stub banner + 4-column stats
    strip (total calls / token spend / model distribution /
    status distribution) + filter pills (all / ok / failed /
    halted) + newest-first timeline. Per row: status icon,
    model badge (tier-tinted: opus blue / sonnet purple / haiku
    gray / null=halted red), cost-rung pill (color-mapped per
    spec: normal gray / warn_75 yellow / downshift_90 orange /
    hard_stop_100 red), duration with visual bar (warning ≥ 2s),
    input→output token counts, status badge (non-default only),
    error_code chip, response excerpt (truncated to 80 chars
    collapsed; expandable to the 200-char API cap).
  * Dashboard card #12 with Brain icon. Layout keeps lg:grid-
    cols-3 — row 2 grows to 3+3+3+3 (clean 4-row × 3-col closure
    of the layout that's been shepherded since DASHBOARD-V2).
    Headline goes destructive when halted > 0 in 24h (Kora was
    budget-locked; investigate cost rung); warning when failed > 0;
    foreground/success otherwise.
  * Route /reasoning + nav entry between /agent-activity and
    /slack-dm (logical grouping with the agent-facing surfaces).

4-layer security contract (extending the established pattern with
reasoning-specific guards):
  1. response_text_truncated_200 rendered as PLAIN TEXT via
     React's default child escaping. Real responses may contain
     anything Kora generates (HTML / markdown / script fragments).
     FE pins via dangerouslySetInnerHTML grep.
  2. NO Anthropic key shapes (sk-ant- prefix + base64-like body)
     anywhere in payload. Walk-payload regex catches a future
     error-projection bug or log-entry edit that leaks credential
     material into the operator's view.
  3. NO PII (email regex / Slack user-ID regex) leaked from the
     inbound user's message into response_text_truncated_200 —
     Kora's generated text is operator-visible; user message
     content lives in SLACK-DM-PANEL, not here. Belt+braces
     walk-payload sweep covers any field, not just response_text.
  4. TS interface declares all fields with documented contracts;
     no raw_prompt / auth_token / response_html companion fields
     exist on ReasoningCall.

K-DG drift catch (per memory `feedback_k_dg_substrate_field_names`):
  * Bucket §3(a) example payload used uppercase Enum NAMES
    (NORMAL / WARN_75 / DOWNSHIFT_90 / HARD_STOP_100) — same
    convention CC#3 cited in PR #126's K-DG drift note.
  * BUT the canonical wire format per
    agent/cost_state_holder.py:114-117 is the lowercase Enum
    VALUES (NORMAL = "normal" etc), and engine.py:47-49's
    CostLadderRungName literal enforces lowercase.
  * Real CC#3 data will emit lowercase via .value. Stub uses
    LOWERCASE to match — so stub and real data agree at flip
    time. Backend test pins lowercase + flags this as the
    intentional spec divergence.

Tests:
  * tests/kora_cli/test_web_server_reasoning.py — 16 tests:
    shape, 4-call stub pin, status+rung+error diversity, per-
    entry schema with full ReasoningEngine error taxonomy
    validation, cost-rung lowercase pin (K-DG catch), all 4
    security guards (Anthropic key walk-payload + PII walk-
    payload both per-field and whole-blob), FE source-pins
    (no dangerouslySetInnerHTML, response_text as JSX child),
    200-char API-edge cap, by_status_24h reconciliation,
    tokens_total shape, cron-regression sanity.
  * Full admin-panel regression: 287/287 across 24 suites
    (with --extra dev only — slowapi now runtime per PR #128).
  * tsc -b + vite build both clean.

Refs:
  * rafe-walker/kora-docs 17_cc_bucket_prompts/KR-REASONING-PANEL_kora_thinking_lens_stub.md
  * PR #126 — KR-FEAT-AI-RESPONSE-LOOP ST1 (ReasoningEngine +
    error taxonomy + CostRung literal source)
  * PR #128 — slowapi runtime promotion (enables --extra dev only)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rafe-walker rafe-walker merged commit 3ef6b87 into feature/phase2-upgrades May 23, 2026
@rafe-walker rafe-walker deleted the feat/kora-KR-REASONING-PANEL branch May 23, 2026 00:25
rafe-walker added a commit that referenced this pull request May 23, 2026
…#130)

Agent-facing tool surface = 8 read + 5 mutating = 13 tools. Other agents have full operational surface on Kora.

2 new client listeners (slack_client_listener + purelymail_client_listener) + 2 new MCP tools (kora__send_slack_dm + kora__send_email) + Slack handler accessor refactor + JSONL caller_actor_kind + PurelymailClient send_email log + listener package wire-in.

§5 rulings: Q1 listener compat (handler accessor first, lazy fallback) / Q2 single JSONL with optional caller_actor_kind / Q3 D-prefix OR Joshua user-ID; U... C... rejected.

Security: tokens absent from error envelopes after diverse-failure-mode injection.

Listener fail-soft contract: capabilities not gates.

36 new tests + 341 cross-bucket regression. CC#1 rebased onto #131 + #132 — combined reasoning-meta JSONL fields with caller_actor_kind audit field; preserved both prose docstrings + entry-building branches. Used explicit-SHA --force-with-lease per feedback_pm_merge_then_delete_race memory.
rafe-walker added a commit that referenced this pull request May 23, 2026
… JSONL (#141)

3 panels flip at once: AGENT-ACTIVITY + REASONING + WEBHOOK-EVENTS now read from kora_audit_log.jsonl.

- NEW kora_cli/audit/jsonl_reader.py (shared helper) + 3 endpoint flips + 4 test files.

2 K-DG drifts caught + handled:
- §2 Flip 1 spec mismatch with actual mcp_tools.py:714-724 writer — handled by setting duration_ms=0, status=ok, using details.result for result_summary.
- §2 Flip 2 nullable cost_rung — handled by using lowercase unknown enum member to preserve PR #132 K-DG pin.

Reasoning grouping: collapses N tool calls sharing caller_session_id into 1 ReasoningCall with tools_used: [...]. Aggregate counts use individual rows (not groups) so headline reflects volume.

Webhook security: source_ip octet-masked at projection edge (audit writer passes RAW peer_ip; endpoint enforces mask). details sub-set to {reason, header_present} — never full audit details. IPv6/dash → defensive fallback.

Fixture-isolation from #137 applied across all 3 test files + reader tests: monkeypatch get_kora_home in all 3 module namespaces.

357/357 admin-panel tests pass across 27 suites.

Follow-on buckets cited: KR-REASONING-PANEL-MODEL-XREF (model/tokens cross-ref from slack_dm log) + KR-MCP-RUNTIME-SURFACE follow-on (extends mcp.tool_called audit with duration_ms + failure-path status).
rafe-walker added a commit that referenced this pull request May 23, 2026
…ia xref (#143)

Cross-references audit JSONL with slack_dm_log outbound entries to populate previously-null model_used / tokens / response_text fields on reasoning panel rows.

- NEW kora_cli/audit/reasoning_xref.py (cross-ref helper with parsing/matching/cost-rung-derivation/text-truncation) + /api/reasoning/recent endpoint update + 27 new xref tests.

K-DG drift caught up-front: spec said verify if outbound JSONL writes caller_session_id; grep found it does NOT. Documented in module header + commit body.

Correlation algorithm: caller_session_id → (channel_id, event_ts) → match outbound with same channel_id where thread_ts == event_ts, fallback to closest sent_at within ±60s.

Cost-rung derivation: substring-match on model name (opus/sonnet/haiku) so future model revs keep mapping correctly; cost_ladder_halted reasoning_error supersedes; preserves lowercase Enum.value pin from #132/#141.

Graceful degradation: when xref fails per-group, row renders with null fields — identical shape to #141 pre-xref output so FE handles both with no conditional logic.

Security carve-out: response_text_truncated_200 is intentionally Joshua-content (carved out from PII sweep, same pattern as #141 message_id and slack_dm panel text).

384/384 admin-panel tests pass across 28 suites.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant