Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

feat(kora): KR-FEAT-AI-RESPONSE-LOOP ST1 — ReasoningEngine + Anthropic impl + context loader#126

Merged
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-FEAT-AI-RESPONSE-LOOP-ST1
May 22, 2026
Merged

feat(kora): KR-FEAT-AI-RESPONSE-LOOP ST1 — ReasoningEngine + Anthropic impl + context loader#126
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-FEAT-AI-RESPONSE-LOOP-ST1

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

Summary

First ST of Phase 2's biggest user-facing milestone. Lays the clean reasoning surface (Protocol + value classes + Anthropic impl + JSONL context loader + operator-editable system prompt) that ST2 will wire into the Slack DM handler to replace the echo.

NOT yet plugged into any handler — that's ST2.

Bucket spec: `kora_docs/17_cc_bucket_prompts/KR-FEAT-AI-RESPONSE-LOOP_kora_thinks.md`.

Base: `feature/phase2-upgrades` — NOT main.

New package: `kora_cli/reasoning/`

  • `engine.py` (~180 lines) — `ReasoningEngine` Protocol + `IncomingMessage` / `ConversationContext` / `ConversationTurn` / `ResponseResult` value classes. Protocol over ABC for test-seam composition.
  • `anthropic_engine.py` (~350 lines) — `AnthropicReasoningEngine` implementing the protocol against `anthropic==0.86.0`. Two-mode credential cascade (see K-DG drift below). Cost-ladder-aware model selection. Operational-state gating. 60s per-call timeout. NO retry on 5xx per PM Q3. Credential-sanitized error mapping.
  • `context_loader.py` (~200 lines) — `load_slack_dm_context()` reads `${HERMES_HOME}/slack_dm_log.jsonl`, filters to (channel, thread), skips filtered/failed entries, projects to `ConversationTurn` list (oldest→newest, capped at max_turns). Resolves operational + cost state strings from holders. Defensive malformed-line handling.

New doc: `kora_docs/00_canonical_current_state/kora_system_prompt.md`

~150 lines, first-draft Kora identity per PM Q1 YES. Composed from project_kora memory: digital extension framing / brevity by default / no preamble or sycophancy / technical fluency assumption / operational-state respect / cost-ladder awareness / NOT-Claude-the-public-assistant boundary. Operator-editable; fail-CLOSED if missing at daemon boot.

K-DG drift findings (surfaced + acted on)

Drift 1 — credential env name conflict. Bucket spec said `KORA_ANTHROPIC_API_KEY` only. Existing env-mapping (PR #110) + gate-2 anti-secrets + "Max plan via Agent SDK billing" framing all point to `CLAUDE_CODE_OAUTH_TOKEN`. Implementation supports BOTH via cascade:

  • `KORA_ANTHROPIC_API_KEY` (if set) → SDK `api_key=...` (Console billing path)
  • `CLAUDE_CODE_OAUTH_TOKEN` (fallback) → SDK `auth_token=...` (Max plan path)
  • Both unset → `ReasoningEngineNotConfigured` fail-CLOSED

Operator chooses; existing OAuth secret in `kora-runtime-anthropic` works without any new provisioning. PM can tighten to single env path in a follow-up if preferred.

Drift 2 — `anthropic` lib was extra-only. Moved to `[web]` extra alongside fastapi/uvicorn/slowapi so daemon deploys don't need a separate `pip install -e ".[anthropic]"` invocation.

Drift 3 — CostRung enum value naming. Bucket spec used `normal / warned / constrained / halted`; actual enum is `NORMAL / WARN_75 / DOWNSHIFT_90 / HARD_STOP_100` per `agent/cost_state_holder.py:114-117`. Engine consumes canonical `.value` strings via `ConversationContext.current_cost_ladder_rung` literal type.

Drift 4 — accessor shapes. `holder.active_rung()` is a method, not a property (`agent/cost_state_holder.py:259`); the accessor is `get_cost_holder()` (`agent/cost_state_holder.py:502`), not `get_holder()`. Verified by grep before implementation.

Cost-ladder model selection

CostRung Model
`normal` `claude-opus-4-7`
`warn_75` `claude-sonnet-4-6`
`downshift_90` `claude-haiku-4-5-20251001`
`hard_stop_100` refuse (no API call; `error="cost_ladder_halted"`)
`unknown` (defensive) opus + WARN log

Error code taxonomy

`sdk_auth` / `sdk_rate_limited` / `sdk_5xx` / `sdk_4xx_` / `sdk_timeout` / `sdk_transport` / `sdk_unknown_` / `cost_ladder_halted` / `operational_state_paused` / `response_projection_failed`. Stable across the codebase.

§4 PM-opens — all defaults accepted

  • Q1: YES first-draft system prompt — composed
  • Q2: 10-turn context — `DEFAULT_MAX_TURNS=10` in loader
  • Q3: NO retry on 5xx — single SDK call (asserted by test)
  • Q4: PAUSED/STOPPED gating — implemented + tested

Tests (42 new, 275 total all passing)

`test_anthropic_engine.py` (24 tests):

  • Construction fail-CLOSED on both creds unset; API key only; OAuth only; API key wins; whitespace-only treated as unset
  • System prompt fail-CLOSED on missing/empty/unreadable; env-path override
  • Cost-ladder: NORMAL→opus, WARN_75→sonnet, DOWNSHIFT_90→haiku, HARD_STOP_100→refuse, unknown→opus + WARN
  • Operational-state gating: paused/stopped refuse, no SDK call
  • Happy path: text + tokens + duration in ResponseResult
  • Message history: alternating roles preserved; consecutive same-role concatenated; system prompt passed to SDK
  • SDK error mapping (6 modes: 401/429/500/400/timeout/unknown)
  • NO retry on 5xx or 429 (single SDK call asserted)
  • SECURITY: credential NEVER in error codes / text / model_used / log lines across 6 failure modes

`test_context_loader.py` (18 tests):

  • Missing/empty file → empty context with "unknown" state strings
  • Channel filtering, thread-ts exact match, None-thread isolation
  • Filtered inbound entries skipped (filtered_non_joshua / dropped_paused / handler_error)
  • Failed outbound entries skipped (`send_status: failed`)
  • Turns ordered oldest→newest
  • max_turns slicing keeps most-recent
  • `DEFAULT_MAX_TURNS == 10` asserted
  • Malformed JSON lines logged + skipped; valid lines still loaded
  • Operational state surfaces from initialized holder
  • Log path env override

§5 ship checklist

  • Base `feature/phase2-upgrades`
  • Title `feat(kora): KR-FEAT-AI-RESPONSE-LOOP STn — `
  • §4 questions resolved
  • `anthropic` dep present in pyproject (moved into `[web]` extra)
  • API key NEVER logged (asserted across 6 failure modes)
  • Cost-ladder rung respected on every reasoning call
  • PAUSED state respected (test for both PAUSED + STOPPED)
  • Engine failure → ResponseResult.error (no crash; canned fallback in ST2)
  • K-DG: literal field names pasted from grep; `.current` @Property vs `active_rung()` method caught; CostRung values verified
  • Tests pass locally (275/275 across full suite)

What's next

ST2 — wire reasoning into the Slack DM handler:

  1. New `kora_cli/listeners/reasoning_engine_listener.py` with `current_reasoning_engine()` accessor mirroring the `current_pool()` pattern
  2. Update `SlackDMHandler` to call `reasoning_engine.respond()` instead of constructing the echo text
  3. Extend outbound JSONL schema with `model_used` / `input_tokens` / `output_tokens` / `reasoning_duration_ms` / `reasoning_error`
  4. Engine failure → canned fallback ("Kora is currently unable to respond; operator notified") — NOT a re-echo, NOT a crash
  5. Cost-ladder write integration: `record_inference()` with CanonicalUsage built from the response

🤖 Generated with Claude Code

…c impl + context loader

First ST of Feature 5's biggest user-facing milestone. Lays the
clean reasoning surface (Protocol + value classes + Anthropic
impl + JSONL context loader + operator-editable system prompt)
that ST2 will wire into the Slack DM handler to replace the echo.

NOT yet plugged into any handler — that's ST2.

## New package

**`kora_cli/reasoning/`**:
- `engine.py` (~180 lines) — `ReasoningEngine` Protocol +
  `IncomingMessage` / `ConversationContext` / `ConversationTurn` /
  `ResponseResult` value classes. Protocol over ABC for test-seam
  composition.
- `anthropic_engine.py` (~350 lines) — `AnthropicReasoningEngine`
  implementing the protocol against `anthropic==0.86.0`. Two-mode
  credential cascade (see K-DG drift below). Cost-ladder-aware
  model selection. Operational-state gating. 60s per-call timeout.
  NO retry on 5xx per PM Q3 default. Credential-sanitized error
  mapping.
- `context_loader.py` (~200 lines) — `load_slack_dm_context()`
  reads `${HERMES_HOME}/slack_dm_log.jsonl`, filters to (channel,
  thread), skips filtered/failed entries, projects to
  `ConversationTurn` list (oldest→newest, capped at max_turns).
  Resolves operational-state + cost-rung strings from holders.
  Defensive parsing (malformed lines logged + skipped).

**`kora_docs/00_canonical_current_state/kora_system_prompt.md`**
(~150 lines) — first-draft Kora identity prompt per PM Q1 YES.
Authored from project_kora memory: digital extension framing,
brevity-by-default, no preamble/sycophancy, technical fluency
assumption, operational-state respect, cost-ladder awareness,
NOT-Claude-the-public-assistant boundary. Operator-editable;
fail-CLOSED if missing at daemon boot.

**`[web]` extra extended** with `anthropic==0.86.0` so daemon
deploys auto-pick up the SDK. Drift 2 resolution.

## K-DG drift findings (surfaced to PM + acted on)

**Drift 1 — credential env name conflict**: bucket spec said
`KORA_ANTHROPIC_API_KEY`; existing env-mapping (PR #110) +
gate-2 anti-secrets + "Max plan via Agent SDK billing" framing
all point to `CLAUDE_CODE_OAUTH_TOKEN`. **Implementation supports
BOTH via cascade**:
- `KORA_ANTHROPIC_API_KEY` (if set) → SDK `api_key=...` (Console billing)
- `CLAUDE_CODE_OAUTH_TOKEN` (fallback) → SDK `auth_token=...` (Max plan)
- Both unset → `ReasoningEngineNotConfigured` fail-CLOSED

Operator chooses; existing OAuth token works without any new
Doppler secret. PM may tighten to a single env path in a
follow-up if preferred.

**Drift 2 — `anthropic` lib was extra-only**: moved to `[web]`
extra alongside fastapi/uvicorn/slowapi so daemon deploys don't
need a separate `pip install -e ".[anthropic]"` invocation.

**Drift 3 — CostRung enum value naming**: bucket spec used
`normal / warned / constrained / halted`; actual enum is
`NORMAL / WARN_75 / DOWNSHIFT_90 / HARD_STOP_100` (`agent/cost_state_holder.py:114-117`).
Engine consumes the canonical `.value` strings via
`ConversationContext.current_cost_ladder_rung` literal type:
`"normal" / "warn_75" / "downshift_90" / "hard_stop_100" / "unknown"`.

**Drift 4 — `holder.active_rung()` is a method, not a property**;
`cost_state_holder.get_cost_holder()` (not `get_holder`). All
verified via grep at `agent/cost_state_holder.py:259`, `:502`.

## Cost-ladder model selection (per R4.1 §9.6 rung mapping)

| CostRung | Model used |
|---|---|
| `normal` | `claude-opus-4-7` |
| `warn_75` | `claude-sonnet-4-6` |
| `downshift_90` | `claude-haiku-4-5-20251001` |
| `hard_stop_100` | refuse — no API call; `error="cost_ladder_halted"` |
| `unknown` (defensive) | `claude-opus-4-7` + WARN log |

## Operational-state gating (per PM Q4 YES default)

`PAUSED` or `STOPPED` → refuse, no API call;
`error="operational_state_paused"`. The handler maps this to a
canned acknowledgment so Joshua isn't met with silence.

## Error code taxonomy (stable across the codebase)

`sdk_auth` (401/403) / `sdk_rate_limited` (429) / `sdk_5xx` /
`sdk_4xx_<code>` / `sdk_timeout` / `sdk_transport` /
`sdk_unknown_<class>` / `cost_ladder_halted` /
`operational_state_paused` / `response_projection_failed`.

## §4 PM-opens — all defaults accepted

- Q1: YES first-draft system prompt — composed (~150 lines)
- Q2: 10-turn context — default in loader (`DEFAULT_MAX_TURNS=10`)
- Q3: NO retry on 5xx — single SDK call exactly (tested)
- Q4: PAUSED/STOPPED gating — implemented + tested

## Tests (42 new, 275 total all passing)

**`test_anthropic_engine.py`** (24 tests):
- Construction: fail-CLOSED on both creds unset; API key only;
  OAuth only; API key wins when both set; whitespace-only treated
  as unset
- System prompt: fail-CLOSED on missing/empty/unreadable;
  env-path override works
- Cost-ladder: NORMAL→opus, WARN_75→sonnet, DOWNSHIFT_90→haiku,
  HARD_STOP_100→refuse (no SDK call), unknown→opus default + WARN
- Operational-state gating: paused/stopped refuse (no SDK call)
- Happy path: successful call returns text + tokens + duration
- Message history: alternating roles preserved; consecutive
  same-role concatenated; system prompt passed to SDK
- SDK error mapping: 401/429/500/400/timeout/unknown — all 6 distinct
- NO retry on 5xx or 429 (single SDK call asserted)
- **SECURITY**: diverse failure-mode sequence (401/429/500/timeout/
  transport/unknown) — credential env value NEVER in error codes,
  text, model_used, or any log line

**`test_context_loader.py`** (18 tests):
- Missing/empty file → empty context + "unknown" state strings
- Channel filtering (same/diff channel)
- Thread filtering (exact match; None matches None)
- Filtered inbound entries skipped (filtered_non_joshua /
  dropped_paused / handler_error)
- Failed outbound entries skipped (send_status: failed)
- Turns ordered oldest→newest (sort by timestamp)
- max_turns slices most-recent
- DEFAULT_MAX_TURNS == 10 (asserted)
- Malformed JSON lines logged + skipped, valid lines still loaded
- Operational state surfaces from initialized holder (paused)
- Log path env override

## §5 ship checklist

- [x] Base `feature/phase2-upgrades`
- [x] Title format `feat(kora): KR-FEAT-AI-RESPONSE-LOOP STn — <scope>`
- [x] §4 questions resolved (all defaults accepted)
- [x] anthropic dep moved into `[web]` extra
- [x] API key NEVER logged (asserted across 6 failure modes)
- [x] Cost-ladder rung respected (5 enum values tested)
- [x] PAUSED state respected (tested for both PAUSED + STOPPED)
- [x] Engine failure paths return ResponseResult.error (no crash)
- [x] K-DG verified: literal field names pasted from grep;
      `.current` (property) vs `active_rung()` (method) caught;
      CostRung values verified against agent/cost_state_holder.py
- [x] Tests pass locally (**275/275** across full suite)

## What's next

**ST2** — wire reasoning into the Slack DM handler:
1. New `kora_cli/listeners/reasoning_engine_listener.py` registering
   via `register_daemon_listener("reasoning_engine", ...)` mirroring
   the `current_pool()` pattern from KR-MCP-CONSUMPTION ST1 with
   a `current_reasoning_engine()` accessor.
2. Update `SlackDMHandler` to call the reasoning engine instead of
   constructing the echo text.
3. Extend outbound JSONL schema with `model_used`, `input_tokens`,
   `output_tokens`, `reasoning_duration_ms`, `reasoning_error`.
4. Engine failure → canned fallback ("Kora is currently unable to
   respond; operator notified") — NOT a re-echo, NOT a crash.
5. Cost-ladder write integration: call `record_inference()` with
   the canonical usage from the response.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rafe-walker rafe-walker merged commit 0fc9a03 into feature/phase2-upgrades May 22, 2026
@rafe-walker rafe-walker deleted the feat/kora-KR-FEAT-AI-RESPONSE-LOOP-ST1 branch May 22, 2026 23:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant