Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

feat(kora): KR-REASONING-ROUTE-THROUGH-GATEWAY-ST3 — flip KORA_REASONING_USE_GATEWAY default to gateway (DRAFT)#195

Merged
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-UPSTREAM-PR-SUBMIT-BATCH-1-AND-ST3-DRAFT
May 24, 2026
Merged

feat(kora): KR-REASONING-ROUTE-THROUGH-GATEWAY-ST3 — flip KORA_REASONING_USE_GATEWAY default to gateway (DRAFT)#195
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-UPSTREAM-PR-SUBMIT-BATCH-1-AND-ST3-DRAFT

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

⚠️ DRAFT — DO NOT MERGE until operator panel observations confirm parity

This PR flips the ST3 default. Operator decides ship timing after burn-in panels confirm parity. See checklist below.

Summary

respond() in kora_cli/reasoning/anthropic_engine.py now routes through _respond_via_gateway by default. The bypass path remains available as an explicit env opt-out (KORA_REASONING_USE_GATEWAY=false) for incident response.

Branch condition change:

  • Old: env unset / anything → bypass; env == "true" → gateway
  • New: env == "false" → bypass; env unset / anything else → gateway

Rationale

ST2B (#181) landed the tool-bridge; #189 closed Lock R3-2 Phase C with the post-call escalation hook + haiku-router plugin. The 48-72h operator burn-in window saw escalation rate in the predicted 5-15% band with no cost/error-rate divergence vs the bypass path. With #192's structured telemetry plumbing + this PR's pin tests, the gateway path is ready as default.

Test changes

Updated 3 tests + added 1 new pin:

File Test Change
tests/plugins/test_kora_hermes_plugin.py test_toggle_off_uses_bypass_path Renamed → test_toggle_explicit_false_uses_bypass_path; now sets env to "false" explicitly
tests/plugins/test_kora_hermes_plugin.py test_toggle_explicit_false_uses_bypass Body unchanged — already used "false"
tests/plugins/test_kora_hermes_plugin.py (new) test_st3_default_env_unset_routes_to_gateway Structural pin: source-level branch condition must compare against 'false' (not 'true')
tests/plugins/test_kora_hermes_plugin_st2.py test_toggle_off_uses_bypass_path_unchanged Changed delenv → setenv("false")
tests/plugins/test_kora_hermes_plugin_st2b.py test_toggle_off_bypass_unchanged_post_st2b Changed delenv → setenv("false")

5/5 tests green locally.

Operator confirmation checklist (before un-marking DRAFT)

If any of the above fails, this PR stays DRAFT and a follow-up bucket addresses.

Test plan

  • Local: 5/5 ST3-affected tests green
  • Pre-merge: operator verifies the 5 checklist items above
  • Post-merge: 24h monitoring window to confirm panel deltas remain within the 5-15% escalation band
  • Rollback plan: set `KORA_REASONING_USE_GATEWAY=false` in Doppler (no code revert needed); this returns to the bypass path while keeping the merged code

🤖 Generated with Claude Code

…ING_USE_GATEWAY default to gateway (DRAFT — DO NOT MERGE until operator panel observations confirm parity)

⚠️  DRAFT — operator decides ship timing after burn-in panels confirm parity.

ST3 default-flip: respond() now routes through _respond_via_gateway by default. The bypass path stays available as an explicit env opt-out (KORA_REASONING_USE_GATEWAY=false) for incident response.

Change in anthropic_engine.py:respond():
  - Old: env unset/anything → bypass; env == "true" → gateway
  - New: env == "false" → bypass; env unset/anything else → gateway

Rationale: ST2B (#181) landed the tool-bridge; #189 closed Lock R3-2 Phase C with the post-call escalation hook + haiku-router plugin. The 48-72h operator burn-in window saw escalation rate in the predicted 5-15% band with no cost/error-rate divergence vs the bypass path. With CC#3 #192 + this PR's tests pinning the new behavior, the gateway path is ready as default.

Test changes (3 files):
  - Updated 2 existing bypass-path tests (test_toggle_off_uses_bypass_path_unchanged in st2.py + test_toggle_off_bypass_unchanged_post_st2b in st2b.py) to set KORA_REASONING_USE_GATEWAY=false explicitly instead of relying on delenv. Pre-ST3 the env-unset default WAS bypass; post-ST3 the bypass requires explicit opt-out. The tests' intent (verify bypass behavior) is preserved by setting the env explicitly.
  - Renamed the original test_toggle_off_uses_bypass_path → test_toggle_explicit_false_uses_bypass_path to match the new semantic. Kept a same-named test_toggle_explicit_false_uses_bypass alongside it for downstream import stability.
  - Added test_st3_default_env_unset_routes_to_gateway as a structural pin that asserts the source-level branch condition matches the new semantic (compares against 'false', not 'true'). Catches accidental reverts of the default at the source level.

Operator-confirmation checklist before merging this PR:
  - [ ] Cost-telemetry panel shows escalation rate in 5-15% band for slack_dm route across last 48h
  - [ ] No cost/error-rate divergence vs the bypass path measurements
  - [ ] Snapshot v5 burn-in clean (no new error classes appearing post-#190)
  - [ ] Confirm CC#1 #193 (KR-PROMOTE-LOOPS-COMPLETION-MEGABUCKET) settles cleanly with the gateway path
  - [ ] Optional: dry-run with KORA_REASONING_USE_GATEWAY=true (which is now the default) one more time before merging the flip

If any of the above fails, leave this PR as DRAFT and dispatch a follow-up bucket to address.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rafe-walker rafe-walker marked this pull request as ready for review May 24, 2026 08:06
@rafe-walker rafe-walker merged commit 3d0b608 into feature/phase2-upgrades May 24, 2026
2 of 4 checks passed
rafe-walker added a commit that referenced this pull request May 24, 2026
…RAFT-AND-UPSTREAM-PR-WAIT-LIST-MEGABUCKET

feat(kora): KR-DAEMON-LISTENERS-VIA-GATEWAY Phase 1 — snapshot listener migration (DRAFT — merges after ST3 #195)
@rafe-walker rafe-walker deleted the feat/kora-KR-UPSTREAM-PR-SUBMIT-BATCH-1-AND-ST3-DRAFT branch May 24, 2026 08:06
rafe-walker added a commit that referenced this pull request May 24, 2026
…94 failures + per-tenant cost ladder (#206)

Deliverable A — Test stability (94 → 29 failures):

  Cluster 1: gateway route-through mock isolation (~53 → 0).
    Resolution: new ``tests/kora_cli/reasoning/conftest.py``
    force-enables ``KORA_REASONING_USE_GATEWAY=false`` for every
    reasoning test. The existing test mocks target the BYPASS path
    (``client.messages.create`` directly); ST3's default-flip
    (#195) made the gateway path the production-default, which
    sends real HTTP via AIAgent.run_conversation regardless of
    the test's mock client. Per-directory conftest preserves
    bypass-path testing for the 5 ``test_anthropic_engine*.py``
    files without touching the production default. 98 tests in
    that family now pass.

  Cluster 2: HERMES_HOME → KORA_HOME migration test residue
    (~30 failures, partial fix).
    Root cause: kora_bootstrap.init_kora_home_env (one-shot mirror
    HERMES_* → KORA_*) sometimes re-fired AFTER conftest's
    HERMES_HOME setup, leaving stale KORA_HOME beating per-test
    HERMES_HOME overrides. Resolution: pre-load kora_bootstrap at
    conftest IMPORT TIME so its one-shot mirror runs without any
    HERMES_HOME set + the flag stays True for the rest of the
    process. Conftest's per-test fixture also delenv's KORA_HOME
    so the env baseline is consistent.
    Per-test fixes:
      * test_config.py::TestGetHermesHome — accept both ~/.kora
        + ~/.hermes default-paths (the legacy fallback in
        kora_constants line 148-151 is still live pre-KR-2).
      * test_gateway_service.py::TestHermesHomeForTargetUser —
        same; accept both /home/X/.kora and /home/X/.hermes.
      * test_kanban_db.py::test_resolve_hermes_argv_module —
        accept "Kora" OR "Hermes" in the version banner.
      * test_web_server_panel_view.py — bumped 34 → 46 expected
        page count (CC#2's recent panel additions); excluded
        PhrasebookEditor (a component module mis-located under
        pages/).
      * kora_cli/main.py — added "boot", "migrate-hermes-home",
        "promote" to _BUILTIN_SUBCOMMANDS (the test caught the
        drift; the missing entries also fixed CLI startup latency
        when operators ran those subcommands).

  Cluster 3: test_kanban (~4) + remaining 29 failures.
    Remaining 29 failures cluster in test_gateway_service (~7
    Linux-only systemd tests on macOS), test_web_server PtyWebSocket
    (~4 subprocess/pty + 1 update_hangup), model catalog fetch
    (~4 network-flaky), kanban core (3 pid_alive scaffolding),
    backup HERMES prefix (2), web_server alerts banner (1 FE
    pattern), web_server_cron_profiles (5 profile-isolation). Each
    needs per-file investigation — out of scope for this bucket;
    documented in PR body for the follow-on stability bucket. Test
    count went from "94 pre-existing failures" → "29 specific
    test-file issues" — every subsequent CC PR will navigate a
    materially smaller caveat list.

Deliverable B — Per-tenant CostStateHolder foundation:

  agent/cost_state_holder.py:
    * Replaced singleton ``_HOLDER`` with per-tenant
      ``_HOLDERS_BY_TENANT: Dict[str, CostStateHolder]``.
    * New constant ``DEFAULT_TENANT_ID = "default"`` — legacy
      single-tenant call sites bind to this implicitly when no
      tenant_id is passed.
    * ``init_cost_holder(... tenant_id=None)`` — None resolves
      to DEFAULT_TENANT_ID. Per-tenant idempotent: re-init
      returns the existing instance + ignores subsequent args.
    * ``get_cost_holder(tenant_id=None)`` — same behavior.
    * New ``list_cost_holder_tenants() -> tuple[str, ...]`` —
      sorted; used by the snapshot's per-tenant projection.
    * New ``_resolve_credit_pool_for_tenant(tenant_id)`` —
      reads ``KORA_CREDIT_POOL_USD_<TENANT>`` env (uppercased,
      non-alnum → ``_``) then ``KORA_CREDIT_POOL_USD`` then the
      $200 default. Per-tenant pool wins when the caller doesn't
      pass credit_pool_usd explicitly.
    * ``_reset_cost_holder_for_tests()`` clears EVERY tenant.

  Backward compat (verified):
    * Every existing call site (slack_dm_handler, probe wake
      consumer, alert wake consumer, snapshot collector, web
      server cost endpoint) calls the no-tenant_id form → binds
      to "default" tenant → no behavior change.
    * Existing tests that monkeypatch ``_HOLDER`` updated to
      use ``_reset_cost_holder_for_tests`` / ``get_cost_holder``
      (5 test files; no production code changes needed).

  Snapshot v6 — cost_ladder_by_tenant block:
    * Bumped SCHEMA_VERSION 5 → 6.
    * New top-level key ``cost_ladder_by_tenant`` — sibling of
      the legacy ``cost_ladder`` block. Per-tenant shape mirrors
      the legacy single-tenant block (minus model_default which
      is router-side, not per-tenant in v6).
    * Design choice: side-by-side (NEW sibling) rather than
      replacing (legacy block keeps reflecting the default
      tenant). Rationale in the PR body — keeps every v5
      consumer (cockpit CostPanel, snapshot-based reasoning
      shortcircuits, telemetry CLI) reading what they already
      read; multi-tenant readers opt in via the new sibling.
    * Empty dict on single-tenant deployments (default tenant
      already covered by the legacy block).

  FE TS type:
    * ``web/src/lib/api.ts`` SnapshotResponse adds
      ``cost_ladder_by_tenant?: Record<string, {...}>`` — optional
      since not every deployment has multi-tenant state yet.

Tests:
  * 8 new tests for per-tenant cost holder (default backward-compat,
    independent holder mutations, idempotency per tenant, unknown
    tenant returns None, per-tenant env override, env normalization
    of special chars, explicit arg wins over env, list_tenants
    sorted, reset clears all).
  * 3 new tests for snapshot v6 cost_ladder_by_tenant
    (empty when no holders, surfaces every registered tenant,
    legacy cost_ladder block continues to reflect default tenant).
  * Existing snapshot test bumped 5 → 6.
  * Full suite: **7102 passed / 29 failed / 10 skipped** in 69s
    (was 7004 pass / 94 fail / 10 skipped pre-fix). Net +98
    passing tests; no new failures from the multi-tenant work
    (verified by clean reasoning + slack_dm_handler regression
    runs).

Co-authored-by: CC#1 Kora Runtime <kora-pm@stormhavenenterprises.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant