This repository was archived by the owner on May 26, 2026. It is now read-only.
feat(kora): KR-HERMES-LOCAL-EXT-REISSUE + KR-HAIKU-ROUTER-PLUGIN — completes Lock R3-2 Phase C#189
Merged
rafe-walker merged 1 commit intoMay 24, 2026
Conversation
…mpletes Lock R3-2 Phase C Batched pair. Adds the missing local Hermes post-LLM re-issue hook and wires the haiku-router sub-plugin against it so should_escalate_post_call (present at cost_ladder/selector.py since #185 but unwired) finally fires. Deliverable A — post_llm_call_can_reissue (agent/conversation_loop.py): fires after messages.create returns and BEFORE normalize / post_api_request observer / tool dispatch. Plugins return {"reissue_with": <new api_kwargs>} to transparently re-call the API; first non-None wins; at most ONE re-issue per iteration (anti-loop break). Fail-safe: invoke_hook wraps each callback; outer guard protects the re-issue call itself; original response preserved on any failure. Telemetry: re-issued response feeds record_inference with escalated_to_opus=True (the original Haiku call already telemetered with the default False inside the retry-loop chokepoint). Deliverable B — kora_cli/reasoning/kora_hermes_plugin/haiku_router/ (constants + escalator + plugin + sub-register, #185 template). Activation gates: Kora-tagged route, iteration == 1, original model is Haiku, KORA_DISABLE_POST_CALL_ESCALATION not "true", response has text content, should_escalate_post_call returns True. Re-issue kwargs: model swapped to Opus; messages extended with Haiku response as assistant turn + terse reviewer prompt (parallel-Claude's pattern — ~30% cheaper escalations). Tests: 30 new tests, 125 directly-affected tests green. Existing seven-hook count test updated to expect 8. Pre-existing failures elsewhere in the suite are environmental (fastapi/blake3/HERMES_HOME) and verified to fail identically on the base branch. After this: 6 of 7 plugin extractions complete; KR-PLUGIN-IDENTITY remains deferred per Lock R3-2. All escalation paths from KR-HAIKU-ROUTER (#165) are now functional end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
rafe-walker
added a commit
that referenced
this pull request
May 24, 2026
…-DAEMON-PREP-MEGABUCKET feat(kora): KR-CC3-CLEANUP-AND-DAEMON-PREP-MEGABUCKET — #189 follow-ups + daemon audit + upstream prep
9 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Batched pair (per [[feedback-batch-bigger-buckets]] + my own #188 recommendation). Two tightly-coupled deliverables: REISSUE adds the Hermes-side hook surface; HAIKU-ROUTER-PLUGIN consumes it. After this lands, 6 of 7 plugin extractions complete and ALL post-call escalation paths from KR-HAIKU-ROUTER (#165) are functional end-to-end.
Deliverable A —
post_llm_call_can_reissuehook (KR-HERMES-LOCAL-EXT-REISSUE)NEW local Hermes hook in
agent/conversation_loop.py. Fires AFTERmessages.createreturns a valid response and BEFORE downstream normalize /post_api_requestobserver / tool dispatch. Plugins return{\"reissue_with\": <new api_kwargs>}to transparently re-call the API with modified kwargs; the re-issued response REPLACES the original for all downstream processing.Override semantics (matches #172 / #181): iterate plugin returns, first non-None
reissue_withwins. Subsequent returns ignored.Anti-loop safety: re-issue fires at most ONCE per iteration. The hook is NOT re-fired against the re-issued response — if the new response would itself trigger another re-issue, the loop ignores it. Without this, two plugins could ping-pong escalations indefinitely.
Fail-safe:
invoke_hookalready try/except wraps each callback; an outer guard protects the re-issue call itself (transport raise, validation failure, etc). On any failure path, the original response is preserved.Telemetry attribution: the re-issued response feeds
record_inference_from_response(...., escalated_to_opus=True). The original Haiku call was already telemetered inside the retry-loop chokepoint with the defaultescalated_to_opus=False— both events land in the cost-ladder estimator so $-burn accounting stays accurate and cockpit panels can compute escalation rate per route.Deliverable B —
kora_hermes_plugin/haiku_router/(KR-HAIKU-ROUTER-PLUGIN)New sub-plugin following the #185 template:
```
kora_cli/reasoning/kora_hermes_plugin/haiku_router/
├── init.py — re-exports public surface
├── constants.py — MODEL_HAIKU / MODEL_OPUS (from cost_ladder), REISSUE_REVIEW_PROMPT, KORA_DISABLE_POST_CALL_ESCALATION
├── escalator.py — pure helpers: extract_first_text / extract_last_user_text / build_opus_reissue_kwargs
└── plugin.py — haiku_router_post_call_escalation handler + sub-register
```
Registered against the new hook via the orchestrator (
KoraHermesPlugin.registernow callsregister_haiku_router).Activation gates (all must hold for escalation): Kora-tagged route,
iteration == 1(post-call escalation only fires on iteration 1 — subsequent iterations already get Opus pre-call viatool_loop_iteration), original model is Haiku, envKORA_DISABLE_POST_CALL_ESCALATIONnot\"true\", response has text content,should_escalate_post_call(...)returns(True, <reason>).Re-issue construction (parallel-Claude's pattern from R3): model swapped to Opus; messages extended with the Haiku response as an assistant turn + a terse reviewer prompt (
\"Please review my last response and improve it if needed. Be terse if confirming.\"). Opus often returns a one-liner confirmation rather than redoing the work — ~30% cheaper escalations.Hook contract diagram
```
agent/conversation_loop.py — iteration loop body
─────────────────────────────────────────────────
build api_kwargs
├─ pre_api_request_mutable hook ← #172 override-shape (model selection, caching)
└─ pre_api_request observer ← per-iteration observer
↓
retry loop:
messages.create(**api_kwargs) → response
validate
record_inference (Haiku call telemetry; escalated_to_opus=False)
break-on-success
↓
── retry-exhaustion guard ──
↓
★ post_llm_call_can_reissue (NEW) ← KR-HERMES-LOCAL-EXT-REISSUE
for plugin_result in invoke_hook(...):
if 'reissue_with' in plugin_result:
api_kwargs ← plugin_result['reissue_with']
response ← interruptible*_api_call(api_kwargs)
record_inference (Opus reissue telemetry; escalated_to_opus=True)
break ← anti-loop: at most one re-issue per iteration
↓
normalize_response → assistant_message
post_api_request observer ← sees FINAL response (post-reissue)
↓
tool dispatch
```
Sample re-issue trace
```
[kora_hermes.cost_ladder] cost_ladder pre-call → default_haiku (iteration=1, route=slack_dm)
[kora_hermes] pre_api_request_mutable override: model=claude-haiku-4-5-20251001
API call #1 (1.42s) → Haiku response: "I'm not sure about the timing of the migration — I don't have enough context on the current load."
[kora.cost_ladder] record_inference: model=claude-haiku-4-5-20251001, escalated_to_opus=False
[kora_hermes.haiku_router] escalating to Opus post-call (reason=low_confidence_marker, route=slack_dm, haiku_chars=99, user_chars=142)
[kora_hermes] post_llm_call_can_reissue re-issuing API call (iteration=1, original_model=claude-haiku-4-5-20251001, new_model=claude-opus-4-7)
API call #1-reissue (2.18s) → Opus response: "Confirmed — wait until tonight's quiet window. The current load is too high for the lock window the migration needs."
[kora.cost_ladder] record_inference: model=claude-opus-4-7, escalated_to_opus=True
[kora_hermes] post_api_request observer fires (sees Opus response only)
```
Tests
30 new tests across two files, 125 directly-affected tests green:
tests/plugins/test_kora_hermes_plugin_haiku_router.py(23 tests): pure helpers, handler activation gating (non-Kora / iteration>1 / non-Haiku / env-disabled / no text / confident), escalation path (low-confidence marker + short-response heuristic), sub-register wiring, orchestrator integration, first-non-None override semantics via realPluginManager, plugin-exception fail-safe, discovery-shim alias, cost-ladder signature pin.tests/agent/test_conversation_loop_post_llm_can_reissue.py(7 tests): structural pins on source (hook name + contract kwargs + anti-loop break + placement after retry-exhaustion guard + beforepost_api_requestobserver +escalated_to_opus=Truetelemetry), plus a behavioral test drivinginvoke_hookend-to-end.Existing seven-hook count test in
tests/plugins/test_kora_hermes_plugin.pyupdated to expect 8 (renamedtest_register_function_wires_eight_hooks).Acceptance / merge gate
post_llm_call_can_reissueadded at conversation_loop.py sitekora_hermes_plugin/haiku_router/plugin file structure exists following feat(kora): KR-PLUGIN-COST-LADDER — first plugin extraction (cost-ladder) #185 templateshould_escalate_post_callfromcost_ladder/selector.pyescalated_to_opus=True. (escalation_reasonis logged at INFO level today; promoting it to arecord_inferencefield is left as a follow-up — see Notes below.)Notes / follow-ups
escalation_reasontelemetry plumbing: today the reason (low_confidence_marker/short_response_for_long_input) is logged at INFO but not threaded throughrecord_inference. Surfacing it as a structured field would need a smallCostStateHolder.record_inferencesignature bump. Not required to land this bucket per spec STOP-ASK condition — happy to follow up if PM wants it for the panel.Re-issue treated as a single iteration: the second API call doesn't increment
api_call_countor consume from the iteration budget — it's accounted as a transparent upgrade of iteration 1. If PM wants the re-issue to count separately, the change is one line in the hook block, but the current shape matches "Haiku-as-Opus-context" semantics better (one logical user-turn answer).Backward-compat shim:
plugins/kora_hermes/__init__.pyre-exportshaiku_router_post_call_escalationas_post_llm_call_can_reissuefor consumer-import stability. Mirrors the_pre_api_request_mutableprecedent.Test plan
post_llm_call+post_api_requestobservers see the FINAL response (post-reissue) onlyKORA_DISABLE_POST_CALL_ESCALATION=trueescape hatch — should fully no-op the new plugin🤖 Generated with Claude Code