This repository was archived by the owner on May 26, 2026. It is now read-only.
feat(kora): KR-DAEMON-LISTENERS-VIA-GATEWAY Phase 1 — snapshot listener migration (DRAFT — merges after ST3 #195)#196
Merged
rafe-walker merged 1 commit intoMay 24, 2026
Conversation
…er migration (DRAFT — merges after ST3 #195)⚠️ DRAFT — DO NOT MERGE until ST3 PR #195 merges first. The migration's code is independent of ST3 being active; the gate is operator-confidence ordering per the audit at kora_docs/14_research/daemon_listeners_via_gateway_2026-05-24/REPORT.md §5. Phase 1 of 6 — proof-of-pattern for the remaining 15 listener migrations. Path B (thin shim): the snapshot listener is now registered against BOTH the existing Kora LISTENER_REGISTRY (back-compat) AND Hermes BackgroundDaemonRegistry (forward path). Both registrations point at the same _listener_singleton so behavior stays identical under either consumer. Phase 6 dissolves the Kora-side registration. Changes (kora_cli/listeners/snapshot_listener.py): - SnapshotListener.startup() now accepts an optional ``coordinator`` kwarg so both registry shapes work: Kora's coordinator calls startup() no-arg (kwarg default applies); Hermes's BackgroundDaemonEntry.startup is Callable[[Any], Any] (one positional arg). The listener is stateless w.r.t. the coordinator — ignored today, forward-compatible. - Process-wide singleton _listener_singleton replaces the per-_factory() construction so both registries get the same instance's bound methods. Prevents double-log-on-startup if both consumers fire. - New BackgroundDaemonEntry built with name="snapshot", periodic_task carrying run_snapshot_cycle at the same interval the heartbeat scheduler uses, shutdown_timeout=DEFAULT_SHUTDOWN_TIMEOUT, plugin_name="kora". - Defensive duplicate-registration guard for the Hermes registry (importlib.reload + xdist test workers can re-import). Tests (tests/kora_cli/test_listeners/test_snapshot_listener.py): - 6 new tests covering the dual-registry contract: - test_hermes_registry_has_snapshot_entry (presence + plugin_name) - test_hermes_entry_carries_periodic_task_spec (callback + interval + name) - test_hermes_and_kora_registrations_share_singleton (singleton invariant) - test_startup_accepts_optional_coordinator (signature compatibility) - test_shutdown_timeout_matches_default (consumer-side timeout policy) - 14/14 snapshot listener tests green. 445/445 focused regression set (listeners + plugin tests) green. Next phase: Phase 2 (8 more periodic-task listener migrations) once this lands. See audit §5 for the recommended order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rafe-walker
pushed a commit
that referenced
this pull request
May 24, 2026
… promote CLI + alert fallback + envelope auto-approve
Deliverable A — Wider-suite test stability:
* Audited missing-deps blocker: all the deps CC#1 has been
flagging across the last several PRs (aiosmtplib, prompt_toolkit,
aioimaplib, fire, openai) are ALREADY declared in
``pyproject.toml`` core / dev extras. The gap was env-side, not
code-side — CC#1's dev environment hadn't been re-synced after
the dep additions across recent buckets.
* Resolution: ``uv pip install -e ".[all,dev]"`` (already
documented in CONTRIBUTING.md). With the deps installed:
- Collection: 7008 tests (was 7004 collected + 4 collection
errors on un-synced env)
- Full suite under hermetic env: **6905 passed / 94 failed /
10 skipped** in 83s
- Sub-suite excluding reasoning + gateway: **6605 passed /
24 failed / 10 skipped** in 80s
* Note: the 94 remaining failures are PRE-EXISTING (verified by
``git stash`` + re-run on the merge commit before this PR).
They cluster in:
- ``reasoning/test_anthropic_engine*.py`` (~53) — gateway-route-
through mock isolation issue introduced by CC#3's #196
daemon Phase 1 default-flip; tests call real Anthropic API
via the route-through path even though they pass a mock
client.
- ``test_backup.py`` / ``test_config.py`` / ``test_cron.py`` /
``test_web_server.py`` (~30) — HERMES_HOME → KORA_HOME
migration tests stamping legacy expectations.
- ``test_kanban*.py`` (~4) — separate flaky area.
Per the bucket spec STOP-ASK §4: these need a separate
stabilization bucket; this one completes with the deps-side
resolved. PR description carries the recommended bucket title.
Deliverable B — ``kora promote`` operator CLI commands:
* New module ``kora_cli/promote_cli.py`` with 4 subcommands:
- ``kora promote status`` — per-loop pending / approved /
rejected / expired counts + last activity timestamp across
all 6 loops.
- ``kora promote run-once <loop>`` — invoke one cycle ad-hoc;
returns the loop's cycle summary dict.
- ``kora promote history <loop> [--days N]`` — recent audit
rows for the loop, filtered to the per-loop seam vocabulary
and (where the seam is shared like ``promotion.approved``)
scoped via the ``promotion:<loop>:`` caller_session_id prefix.
- ``kora promote pending <loop>`` — JSON dump of currently-
pending proposals; ordered highest-confidence first.
* Snapshot-expand's audit-only layout (no pending/approved/
rejected) gets special-cased — ``pending`` errors with a clear
redirect to ``history``; ``status`` surfaces applied-record
counts only.
* Registered under main.py's existing subparser pattern; cycle
imports are LAZY so ``kora promote --help`` doesn't pull the
clustering / pricing chain.
Deliverable C — Alert investigation fallback DM polish:
* The mechanism shipped in #197 (fallback DM on reasoning failure
via ``append_outbound_log_entry`` + dm_status=
``engine_unavailable_fallback`` in the audit). KR-CC1-POLISH
improves the wording:
- Header now surfaces ``category (severity): alert id N
(via {channel})``
- Footer is explicit: "Kora is unavailable to investigate this
alert ... Review the alerts panel and act manually — Kora
will not retry this investigation."
* New test asserts dm_status exactly ==
``"engine_unavailable_fallback"`` on the engine-None path; CC#2's
KR-FE-ALERT-INVESTIGATIONS-VIEWER reads that enum value.
Deliverable D — Probe-fix-envelope auto-approve (low-risk):
* New env: ``KORA_PROMOTE_PROBE_FIX_AUTO_APPROVE_LOW_RISK``
(default ``false``; operator opts in).
* New env: ``KORA_PROMOTE_PROBE_FIX_AUTO_APPROVE_WAIT_HOURS``
(default ``1.0``).
* New module ``kora_cli/promote/probe_fix_envelopes/auto_approve.py``
with ``run_auto_approve_sweep`` — runs at end of each
probe-fix-envelope cycle, walks pending proposals, auto-
approves those whose ``blast_radius_level == "low"`` AND have
been pending ≥ wait_hours. Approval = transition to
``approved/`` + emit ``promotion.probe_envelope_action_auto_approved``
audit row.
* New ``ProbeEnvelopeProposal.blast_radius_level`` field
(``"low" | "medium" | "high"``); default ``"high"`` preserves
the existing operator-must-review posture. Backwards-compat:
legacy on-disk payloads without the field rehydrate to
``"high"`` via the new ``proposal_from_dict`` helper.
* Heuristic in ``_derive_blast_radius_level`` returns ``"low"``
ONLY for known-narrow envelope patterns
(``_KNOWN_LOW_RISK_PATTERNS`` — currently the fly
restart_unhealthy_machine envelope's (probe, issue_category)
pairs). Everything else defaults to ``"high"``. The heuristic
intentionally undershoots — false-low classifications would
let proposals slip through to operator's envelope without
review.
* CRITICAL — two-tier gating preserved (documented inline +
in the auto_approve module docstring):
1. Auto-approve → "this is in our envelope vocabulary"
2. Per-probe ``KORA_PROBE_AUTOFIX_<NAME>_ENABLED`` →
"Kora is permitted to invoke it at runtime"
The auto-approve flag DOES NOT cause Kora to execute the
fix — only adds it to the vocabulary; operator still enables
the per-probe ENABLED env separately to authorize execution.
* New audit seam ``promotion.probe_envelope_action_auto_approved``
extending SeamName Literal with the auto_approve_wait_hours +
auto_approved_at fields for operator timeline reconstruction.
Tests:
* 12 new tests for the probe-fix-envelope auto-approve sweep
(heuristic / fixture-backed rehydration / sweep disabled by
default / wait-window enforcement / high-risk never approves /
audit payload shape).
* 13 new tests for ``kora promote`` CLI commands (status / pending
/ history / run-once dispatch / loop allowlist / snapshot_expand
special-case error / cross-loop history caller_session_id
filtering).
* 2 new tests for alert fallback DM polish (wording assertion +
dm_status enum assertion).
* All 41 new tests pass; all relevant existing tests still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rafe-walker
added a commit
that referenced
this pull request
May 24, 2026
…n pip POC + structural FATAL flag (#204) Two deliverables closing #203's §6.4 gap (no pip-install path validated) + #200's deferred follow-up (FATAL contract was documentation-only). Per Joshua's amended feedback-local-first-upstream-after: structure only, no PyPI publish, no upstream PR. Deliverable A — Pip-packaging foundation plugins/marvin/ restructured to relocatable src/ layout: plugins/marvin/ ├── pyproject.toml # NEW — setuptools build + hermes_agent.plugins entry point ├── README.md # NEW — operator-facing install doc ├── plugin.yaml # KEPT — Hermes bundled-plugin discovery ├── __init__.py # REPLACED — sys.path-adjusting compat shim └── src/marvin/ ├── __init__.py # canonical module (relocated) └── data/ ├── MARVIN.md # relocated └── marvin_system_prompt.md pyproject.toml declares "[project.entry-points.hermes_agent.plugins] marvin = marvin:register" — the exact entry-point group Hermes's _scan_entry_points already reads. Package_data includes the markdown files so the wheel ships relocatable identity assets. The in-tree compat shim at plugins/marvin/__init__.py prepends plugins/marvin/src to sys.path + re-exports register/marvin_identity_provider from the canonical module. Existing 11 multi-tenant tests from #203 continue to pass with only one updated assertion (data files moved to src/marvin/data/). Live dry-run install validated this session: $ uv build --wheel → marvin_runtime-0.1.0a1-py3-none-any.whl $ pip install <wheel> into fresh /tmp/marvin-dry-install venv $ importlib.metadata.entry_points() discovers ('marvin', 'marvin:register') $ marvin.__file__ = /tmp/.../site-packages/marvin/__init__.py (NOT in-tree) $ provider call → IdentitySpec(agent_name='Marvin', soul_chars=1306, system_chars=1486) $ register(stub_ctx) → wires identity provider via stub PluginContext ✓ CI-runnable regression guard: tests/plugins/test_marvin_pip_install_dry_run.py — 6 tests covering pyproject structure pins, wheel content pins, entry-point declaration pin, METADATA Requires-Python pin. ~0.5s per run via uv build (or python -m build if uv unavailable). Companion kora-docs deliverables (separate PR): - kora_docs/14_research/kora_pip_packaging_2026-05-24/AUDIT.md — concrete shopping list for the future 4-package Kora restructure (kora-runtime + kora-cli + kora-cockpit + kora-promote-loops). 7-9 CC-days estimate across 4 sequential phases. 4 open questions for operator (Hermes pip-installability, IsoKron client packaging, schema migrations, versioning cadence). - kora_docs/14_research/plugin_identity_option_c_2026-05-24/MARVIN_DEMO_TRANSCRIPT.md §7 addendum — gap §6.4 closed; full wheel build + dry-run install transcript captured. Deliverable B — Structural FATAL flag Replaces #200's documentation-driven FATAL contract with structural enforcement: agent/background_daemon_registry.py: - New field BackgroundDaemonEntry.fatal_on_startup_failure: bool = False - Field docstring documents the semantics + the looked-up-by-name path for the Path B thin-shim shape kora_cli/plugins.py: - PluginContext.register_background_daemon accepts the new fatal_on_startup_failure kwarg + threads it into the BackgroundDaemonEntry construction kora_cli/daemon.py: - New method DaemonCoordinator._is_startup_failure_fatal(listener_name): - Looks up the listener's BackgroundDaemonEntry by name - Returns entry.fatal_on_startup_failure if found - Returns True if not found (preserves pre-flag behavior for Kora-only HTTP service mounts: web/mcp/webhooks) - Returns True on lookup failure (defensive — never silently degrade on infrastructure error) - run() loop checks the flag at the listener-startup-raise site: - True → abort daemon boot (existing behavior) - False → log + continue starting subsequent listeners (NEW: lenient default for non-critical daemons whose own try/except didn't catch an unexpected exception) kora_cli/listeners/reasoning_engine_listener.py: - _hermes_entry now passes fatal_on_startup_failure=True explicitly - Module docstring updated to reflect structural-not-documentation contract Tests: 18 in tests/kora_cli/test_daemon_fatal_on_startup_failure.py covering: - Dataclass field defaults + frozen invariant - PluginContext kwarg passthrough - Lookup behavior (fatal entry / non-fatal entry / no entry / unknown name) - End-to-end coordinator behavior (fatal raises abort; non-fatal raises log + continue past) - Production pin: reasoning_engine has fatal=True; 7 phase-1/2/3 listeners (snapshot/heartbeat_probes/slack_client/purelymail_client/alert_notifier/cost_telemetry/mcp_consumption) default to fatal=False Per the [[feedback-local-first-upstream-after]] amendment: this Hermes extension lives in the fork only this dispatch. When operator approves the next upstream-PR batch, this becomes upstream candidate #4 (joining the 3 already-deferred branches from #196). Tests: 473/473 focused regression set green (Marvin tests + pip dry-run tests + FATAL flag tests + listener tests + identity tests + plugin tests). Co-authored-by: CC#3 Kora Runtime <kora-pm@stormhavenenterprises.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR's code is independent of ST3 being active — the gate is operator-confidence ordering per the audit at `kora_docs/14_research/daemon_listeners_via_gateway_2026-05-24/REPORT.md` §5. Phase 1 lands cleanly even with the bypass path still default; the only reason to gate on #195 is so the operator has confidence the gateway-route-through pathway is verified before refactoring listener registration.
Summary
Phase 1 of 6 — proof-of-pattern for the remaining 15 listener migrations. Path B (thin shim): the snapshot listener is now registered against BOTH the existing Kora `LISTENER_REGISTRY` (back-compat) AND Hermes `BackgroundDaemonRegistry` (forward path). Both registrations point at the same `_listener_singleton` so behavior stays identical under either consumer. Phase 6 dissolves the Kora-side registration.
Why snapshot first
Per audit §5: snapshot is the lowest-risk migration target — pure periodic-task daemon, no cross-cutting accessor (other listeners don't read singletons it owns), no startup-failure-FATAL semantic. If the pattern works here, it generalizes to the 8 other periodic-task daemons (Phase 2).
Code changes
`kora_cli/listeners/snapshot_listener.py`:
Test changes
`tests/kora_cli/test_listeners/test_snapshot_listener.py` — 6 new tests covering the dual-registry contract:
Test plan
Next dispatch
Phase 2 (8 periodic-task listener migrations) per audit §5, only after this Phase 1 PR lands cleanly. Recommended order in audit §5.
🤖 Generated with Claude Code