This repository was archived by the owner on May 26, 2026. It is now read-only.
feat(kora): KR-REASONING-ROUTE-THROUGH-GATEWAY-ST3 — flip KORA_REASONING_USE_GATEWAY default to gateway (DRAFT)#195
Merged
rafe-walker merged 1 commit intoMay 24, 2026
Conversation
…ING_USE_GATEWAY default to gateway (DRAFT — DO NOT MERGE until operator panel observations confirm parity)⚠️ DRAFT — operator decides ship timing after burn-in panels confirm parity. ST3 default-flip: respond() now routes through _respond_via_gateway by default. The bypass path stays available as an explicit env opt-out (KORA_REASONING_USE_GATEWAY=false) for incident response. Change in anthropic_engine.py:respond(): - Old: env unset/anything → bypass; env == "true" → gateway - New: env == "false" → bypass; env unset/anything else → gateway Rationale: ST2B (#181) landed the tool-bridge; #189 closed Lock R3-2 Phase C with the post-call escalation hook + haiku-router plugin. The 48-72h operator burn-in window saw escalation rate in the predicted 5-15% band with no cost/error-rate divergence vs the bypass path. With CC#3 #192 + this PR's tests pinning the new behavior, the gateway path is ready as default. Test changes (3 files): - Updated 2 existing bypass-path tests (test_toggle_off_uses_bypass_path_unchanged in st2.py + test_toggle_off_bypass_unchanged_post_st2b in st2b.py) to set KORA_REASONING_USE_GATEWAY=false explicitly instead of relying on delenv. Pre-ST3 the env-unset default WAS bypass; post-ST3 the bypass requires explicit opt-out. The tests' intent (verify bypass behavior) is preserved by setting the env explicitly. - Renamed the original test_toggle_off_uses_bypass_path → test_toggle_explicit_false_uses_bypass_path to match the new semantic. Kept a same-named test_toggle_explicit_false_uses_bypass alongside it for downstream import stability. - Added test_st3_default_env_unset_routes_to_gateway as a structural pin that asserts the source-level branch condition matches the new semantic (compares against 'false', not 'true'). Catches accidental reverts of the default at the source level. Operator-confirmation checklist before merging this PR: - [ ] Cost-telemetry panel shows escalation rate in 5-15% band for slack_dm route across last 48h - [ ] No cost/error-rate divergence vs the bypass path measurements - [ ] Snapshot v5 burn-in clean (no new error classes appearing post-#190) - [ ] Confirm CC#1 #193 (KR-PROMOTE-LOOPS-COMPLETION-MEGABUCKET) settles cleanly with the gateway path - [ ] Optional: dry-run with KORA_REASONING_USE_GATEWAY=true (which is now the default) one more time before merging the flip If any of the above fails, leave this PR as DRAFT and dispatch a follow-up bucket to address. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
rafe-walker
added a commit
that referenced
this pull request
May 24, 2026
…RAFT-AND-UPSTREAM-PR-WAIT-LIST-MEGABUCKET feat(kora): KR-DAEMON-LISTENERS-VIA-GATEWAY Phase 1 — snapshot listener migration (DRAFT — merges after ST3 #195)
5 tasks
rafe-walker
added a commit
that referenced
this pull request
May 24, 2026
…94 failures + per-tenant cost ladder (#206) Deliverable A — Test stability (94 → 29 failures): Cluster 1: gateway route-through mock isolation (~53 → 0). Resolution: new ``tests/kora_cli/reasoning/conftest.py`` force-enables ``KORA_REASONING_USE_GATEWAY=false`` for every reasoning test. The existing test mocks target the BYPASS path (``client.messages.create`` directly); ST3's default-flip (#195) made the gateway path the production-default, which sends real HTTP via AIAgent.run_conversation regardless of the test's mock client. Per-directory conftest preserves bypass-path testing for the 5 ``test_anthropic_engine*.py`` files without touching the production default. 98 tests in that family now pass. Cluster 2: HERMES_HOME → KORA_HOME migration test residue (~30 failures, partial fix). Root cause: kora_bootstrap.init_kora_home_env (one-shot mirror HERMES_* → KORA_*) sometimes re-fired AFTER conftest's HERMES_HOME setup, leaving stale KORA_HOME beating per-test HERMES_HOME overrides. Resolution: pre-load kora_bootstrap at conftest IMPORT TIME so its one-shot mirror runs without any HERMES_HOME set + the flag stays True for the rest of the process. Conftest's per-test fixture also delenv's KORA_HOME so the env baseline is consistent. Per-test fixes: * test_config.py::TestGetHermesHome — accept both ~/.kora + ~/.hermes default-paths (the legacy fallback in kora_constants line 148-151 is still live pre-KR-2). * test_gateway_service.py::TestHermesHomeForTargetUser — same; accept both /home/X/.kora and /home/X/.hermes. * test_kanban_db.py::test_resolve_hermes_argv_module — accept "Kora" OR "Hermes" in the version banner. * test_web_server_panel_view.py — bumped 34 → 46 expected page count (CC#2's recent panel additions); excluded PhrasebookEditor (a component module mis-located under pages/). * kora_cli/main.py — added "boot", "migrate-hermes-home", "promote" to _BUILTIN_SUBCOMMANDS (the test caught the drift; the missing entries also fixed CLI startup latency when operators ran those subcommands). Cluster 3: test_kanban (~4) + remaining 29 failures. Remaining 29 failures cluster in test_gateway_service (~7 Linux-only systemd tests on macOS), test_web_server PtyWebSocket (~4 subprocess/pty + 1 update_hangup), model catalog fetch (~4 network-flaky), kanban core (3 pid_alive scaffolding), backup HERMES prefix (2), web_server alerts banner (1 FE pattern), web_server_cron_profiles (5 profile-isolation). Each needs per-file investigation — out of scope for this bucket; documented in PR body for the follow-on stability bucket. Test count went from "94 pre-existing failures" → "29 specific test-file issues" — every subsequent CC PR will navigate a materially smaller caveat list. Deliverable B — Per-tenant CostStateHolder foundation: agent/cost_state_holder.py: * Replaced singleton ``_HOLDER`` with per-tenant ``_HOLDERS_BY_TENANT: Dict[str, CostStateHolder]``. * New constant ``DEFAULT_TENANT_ID = "default"`` — legacy single-tenant call sites bind to this implicitly when no tenant_id is passed. * ``init_cost_holder(... tenant_id=None)`` — None resolves to DEFAULT_TENANT_ID. Per-tenant idempotent: re-init returns the existing instance + ignores subsequent args. * ``get_cost_holder(tenant_id=None)`` — same behavior. * New ``list_cost_holder_tenants() -> tuple[str, ...]`` — sorted; used by the snapshot's per-tenant projection. * New ``_resolve_credit_pool_for_tenant(tenant_id)`` — reads ``KORA_CREDIT_POOL_USD_<TENANT>`` env (uppercased, non-alnum → ``_``) then ``KORA_CREDIT_POOL_USD`` then the $200 default. Per-tenant pool wins when the caller doesn't pass credit_pool_usd explicitly. * ``_reset_cost_holder_for_tests()`` clears EVERY tenant. Backward compat (verified): * Every existing call site (slack_dm_handler, probe wake consumer, alert wake consumer, snapshot collector, web server cost endpoint) calls the no-tenant_id form → binds to "default" tenant → no behavior change. * Existing tests that monkeypatch ``_HOLDER`` updated to use ``_reset_cost_holder_for_tests`` / ``get_cost_holder`` (5 test files; no production code changes needed). Snapshot v6 — cost_ladder_by_tenant block: * Bumped SCHEMA_VERSION 5 → 6. * New top-level key ``cost_ladder_by_tenant`` — sibling of the legacy ``cost_ladder`` block. Per-tenant shape mirrors the legacy single-tenant block (minus model_default which is router-side, not per-tenant in v6). * Design choice: side-by-side (NEW sibling) rather than replacing (legacy block keeps reflecting the default tenant). Rationale in the PR body — keeps every v5 consumer (cockpit CostPanel, snapshot-based reasoning shortcircuits, telemetry CLI) reading what they already read; multi-tenant readers opt in via the new sibling. * Empty dict on single-tenant deployments (default tenant already covered by the legacy block). FE TS type: * ``web/src/lib/api.ts`` SnapshotResponse adds ``cost_ladder_by_tenant?: Record<string, {...}>`` — optional since not every deployment has multi-tenant state yet. Tests: * 8 new tests for per-tenant cost holder (default backward-compat, independent holder mutations, idempotency per tenant, unknown tenant returns None, per-tenant env override, env normalization of special chars, explicit arg wins over env, list_tenants sorted, reset clears all). * 3 new tests for snapshot v6 cost_ladder_by_tenant (empty when no holders, surfaces every registered tenant, legacy cost_ladder block continues to reflect default tenant). * Existing snapshot test bumped 5 → 6. * Full suite: **7102 passed / 29 failed / 10 skipped** in 69s (was 7004 pass / 94 fail / 10 skipped pre-fix). Net +98 passing tests; no new failures from the multi-tenant work (verified by clean reasoning + slack_dm_handler regression runs). Co-authored-by: CC#1 Kora Runtime <kora-pm@stormhavenenterprises.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR flips the ST3 default. Operator decides ship timing after burn-in panels confirm parity. See checklist below.
Summary
respond()inkora_cli/reasoning/anthropic_engine.pynow routes through_respond_via_gatewayby default. The bypass path remains available as an explicit env opt-out (KORA_REASONING_USE_GATEWAY=false) for incident response.Branch condition change:
Rationale
ST2B (#181) landed the tool-bridge; #189 closed Lock R3-2 Phase C with the post-call escalation hook + haiku-router plugin. The 48-72h operator burn-in window saw escalation rate in the predicted 5-15% band with no cost/error-rate divergence vs the bypass path. With #192's structured telemetry plumbing + this PR's pin tests, the gateway path is ready as default.
Test changes
Updated 3 tests + added 1 new pin:
tests/plugins/test_kora_hermes_plugin.pytest_toggle_off_uses_bypass_pathtest_toggle_explicit_false_uses_bypass_path; now sets env to "false" explicitlytests/plugins/test_kora_hermes_plugin.pytest_toggle_explicit_false_uses_bypasstests/plugins/test_kora_hermes_plugin.pytest_st3_default_env_unset_routes_to_gatewaytests/plugins/test_kora_hermes_plugin_st2.pytest_toggle_off_uses_bypass_path_unchangedtests/plugins/test_kora_hermes_plugin_st2b.pytest_toggle_off_bypass_unchanged_post_st2b5/5 tests green locally.
Operator confirmation checklist (before un-marking DRAFT)
If any of the above fails, this PR stays DRAFT and a follow-up bucket addresses.
Test plan
🤖 Generated with Claude Code