Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

feat(kora): KR-DAEMON-LISTENERS-VIA-GATEWAY Phase 1 — snapshot listener migration (DRAFT — merges after ST3 #195)#196

Merged
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-DAEMON-PHASE1-DRAFT-AND-UPSTREAM-PR-WAIT-LIST-MEGABUCKET
May 24, 2026
Merged

feat(kora): KR-DAEMON-LISTENERS-VIA-GATEWAY Phase 1 — snapshot listener migration (DRAFT — merges after ST3 #195)#196
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-DAEMON-PHASE1-DRAFT-AND-UPSTREAM-PR-WAIT-LIST-MEGABUCKET

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

⚠️ DRAFT — DO NOT MERGE until ST3 PR #195 merges first

This PR's code is independent of ST3 being active — the gate is operator-confidence ordering per the audit at `kora_docs/14_research/daemon_listeners_via_gateway_2026-05-24/REPORT.md` §5. Phase 1 lands cleanly even with the bypass path still default; the only reason to gate on #195 is so the operator has confidence the gateway-route-through pathway is verified before refactoring listener registration.

Summary

Phase 1 of 6 — proof-of-pattern for the remaining 15 listener migrations. Path B (thin shim): the snapshot listener is now registered against BOTH the existing Kora `LISTENER_REGISTRY` (back-compat) AND Hermes `BackgroundDaemonRegistry` (forward path). Both registrations point at the same `_listener_singleton` so behavior stays identical under either consumer. Phase 6 dissolves the Kora-side registration.

Why snapshot first

Per audit §5: snapshot is the lowest-risk migration target — pure periodic-task daemon, no cross-cutting accessor (other listeners don't read singletons it owns), no startup-failure-FATAL semantic. If the pattern works here, it generalizes to the 8 other periodic-task daemons (Phase 2).

Code changes

`kora_cli/listeners/snapshot_listener.py`:

  • `SnapshotListener.startup()` now accepts an optional `coordinator` kwarg so both registry shapes work:
    • Kora's coordinator: `startup()` no-arg (the kwarg default applies)
    • Hermes's `BackgroundDaemonEntry.startup` is `Callable[[Any], Any]` (one positional arg)
  • Process-wide singleton `_listener_singleton` replaces per-`_factory()` construction so both registries get the same instance's bound methods. Prevents double-log-on-startup if both consumers fire.
  • New `BackgroundDaemonEntry` built with `name="snapshot"`, `periodic_task` carrying `run_snapshot_cycle` at the same interval the heartbeat scheduler uses, `shutdown_timeout=DEFAULT_SHUTDOWN_TIMEOUT`, `plugin_name="kora"`.
  • Defensive duplicate-registration guard for the Hermes registry (`importlib.reload` + xdist test workers can re-import the module).

Test changes

`tests/kora_cli/test_listeners/test_snapshot_listener.py` — 6 new tests covering the dual-registry contract:

Test Pins
`test_hermes_registry_has_snapshot_entry` presence + `plugin_name="kora"`
`test_hermes_entry_carries_periodic_task_spec` callback identity + interval value + name
`test_hermes_and_kora_registrations_share_singleton` singleton invariant — both consumers' bound methods are `is` the singleton's methods
`test_startup_accepts_optional_coordinator` signature compatibility (no-arg + positional + kwarg all work)
`test_shutdown_timeout_matches_default` consumer-side timeout policy stays unified

Test plan

Next dispatch

Phase 2 (8 periodic-task listener migrations) per audit §5, only after this Phase 1 PR lands cleanly. Recommended order in audit §5.

🤖 Generated with Claude Code

…er migration (DRAFT — merges after ST3 #195)

⚠️  DRAFT — DO NOT MERGE until ST3 PR #195 merges first. The migration's code is independent of ST3 being active; the gate is operator-confidence ordering per the audit at kora_docs/14_research/daemon_listeners_via_gateway_2026-05-24/REPORT.md §5.

Phase 1 of 6 — proof-of-pattern for the remaining 15 listener migrations. Path B (thin shim): the snapshot listener is now registered against BOTH the existing Kora LISTENER_REGISTRY (back-compat) AND Hermes BackgroundDaemonRegistry (forward path). Both registrations point at the same _listener_singleton so behavior stays identical under either consumer. Phase 6 dissolves the Kora-side registration.

Changes (kora_cli/listeners/snapshot_listener.py):
- SnapshotListener.startup() now accepts an optional ``coordinator`` kwarg so both registry shapes work: Kora's coordinator calls startup() no-arg (kwarg default applies); Hermes's BackgroundDaemonEntry.startup is Callable[[Any], Any] (one positional arg). The listener is stateless w.r.t. the coordinator — ignored today, forward-compatible.
- Process-wide singleton _listener_singleton replaces the per-_factory() construction so both registries get the same instance's bound methods. Prevents double-log-on-startup if both consumers fire.
- New BackgroundDaemonEntry built with name="snapshot", periodic_task carrying run_snapshot_cycle at the same interval the heartbeat scheduler uses, shutdown_timeout=DEFAULT_SHUTDOWN_TIMEOUT, plugin_name="kora".
- Defensive duplicate-registration guard for the Hermes registry (importlib.reload + xdist test workers can re-import).

Tests (tests/kora_cli/test_listeners/test_snapshot_listener.py):
- 6 new tests covering the dual-registry contract:
  - test_hermes_registry_has_snapshot_entry (presence + plugin_name)
  - test_hermes_entry_carries_periodic_task_spec (callback + interval + name)
  - test_hermes_and_kora_registrations_share_singleton (singleton invariant)
  - test_startup_accepts_optional_coordinator (signature compatibility)
  - test_shutdown_timeout_matches_default (consumer-side timeout policy)
- 14/14 snapshot listener tests green. 445/445 focused regression set (listeners + plugin tests) green.

Next phase: Phase 2 (8 more periodic-task listener migrations) once this lands. See audit §5 for the recommended order.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rafe-walker rafe-walker marked this pull request as ready for review May 24, 2026 08:06
@rafe-walker rafe-walker merged commit 422a83b into feature/phase2-upgrades May 24, 2026
2 of 4 checks passed
@rafe-walker rafe-walker deleted the feat/kora-KR-DAEMON-PHASE1-DRAFT-AND-UPSTREAM-PR-WAIT-LIST-MEGABUCKET branch May 24, 2026 08:06
rafe-walker pushed a commit that referenced this pull request May 24, 2026
… promote CLI + alert fallback + envelope auto-approve

Deliverable A — Wider-suite test stability:
* Audited missing-deps blocker: all the deps CC#1 has been
  flagging across the last several PRs (aiosmtplib, prompt_toolkit,
  aioimaplib, fire, openai) are ALREADY declared in
  ``pyproject.toml`` core / dev extras. The gap was env-side, not
  code-side — CC#1's dev environment hadn't been re-synced after
  the dep additions across recent buckets.
* Resolution: ``uv pip install -e ".[all,dev]"`` (already
  documented in CONTRIBUTING.md). With the deps installed:
  - Collection: 7008 tests (was 7004 collected + 4 collection
    errors on un-synced env)
  - Full suite under hermetic env: **6905 passed / 94 failed /
    10 skipped** in 83s
  - Sub-suite excluding reasoning + gateway: **6605 passed /
    24 failed / 10 skipped** in 80s
* Note: the 94 remaining failures are PRE-EXISTING (verified by
  ``git stash`` + re-run on the merge commit before this PR).
  They cluster in:
  - ``reasoning/test_anthropic_engine*.py`` (~53) — gateway-route-
    through mock isolation issue introduced by CC#3's #196
    daemon Phase 1 default-flip; tests call real Anthropic API
    via the route-through path even though they pass a mock
    client.
  - ``test_backup.py`` / ``test_config.py`` / ``test_cron.py`` /
    ``test_web_server.py`` (~30) — HERMES_HOME → KORA_HOME
    migration tests stamping legacy expectations.
  - ``test_kanban*.py`` (~4) — separate flaky area.
  Per the bucket spec STOP-ASK §4: these need a separate
  stabilization bucket; this one completes with the deps-side
  resolved. PR description carries the recommended bucket title.

Deliverable B — ``kora promote`` operator CLI commands:
* New module ``kora_cli/promote_cli.py`` with 4 subcommands:
  - ``kora promote status`` — per-loop pending / approved /
    rejected / expired counts + last activity timestamp across
    all 6 loops.
  - ``kora promote run-once <loop>`` — invoke one cycle ad-hoc;
    returns the loop's cycle summary dict.
  - ``kora promote history <loop> [--days N]`` — recent audit
    rows for the loop, filtered to the per-loop seam vocabulary
    and (where the seam is shared like ``promotion.approved``)
    scoped via the ``promotion:<loop>:`` caller_session_id prefix.
  - ``kora promote pending <loop>`` — JSON dump of currently-
    pending proposals; ordered highest-confidence first.
* Snapshot-expand's audit-only layout (no pending/approved/
  rejected) gets special-cased — ``pending`` errors with a clear
  redirect to ``history``; ``status`` surfaces applied-record
  counts only.
* Registered under main.py's existing subparser pattern; cycle
  imports are LAZY so ``kora promote --help`` doesn't pull the
  clustering / pricing chain.

Deliverable C — Alert investigation fallback DM polish:
* The mechanism shipped in #197 (fallback DM on reasoning failure
  via ``append_outbound_log_entry`` + dm_status=
  ``engine_unavailable_fallback`` in the audit). KR-CC1-POLISH
  improves the wording:
  - Header now surfaces ``category (severity): alert id N
    (via {channel})``
  - Footer is explicit: "Kora is unavailable to investigate this
    alert ... Review the alerts panel and act manually — Kora
    will not retry this investigation."
* New test asserts dm_status exactly ==
  ``"engine_unavailable_fallback"`` on the engine-None path; CC#2's
  KR-FE-ALERT-INVESTIGATIONS-VIEWER reads that enum value.

Deliverable D — Probe-fix-envelope auto-approve (low-risk):
* New env: ``KORA_PROMOTE_PROBE_FIX_AUTO_APPROVE_LOW_RISK``
  (default ``false``; operator opts in).
* New env: ``KORA_PROMOTE_PROBE_FIX_AUTO_APPROVE_WAIT_HOURS``
  (default ``1.0``).
* New module ``kora_cli/promote/probe_fix_envelopes/auto_approve.py``
  with ``run_auto_approve_sweep`` — runs at end of each
  probe-fix-envelope cycle, walks pending proposals, auto-
  approves those whose ``blast_radius_level == "low"`` AND have
  been pending ≥ wait_hours. Approval = transition to
  ``approved/`` + emit ``promotion.probe_envelope_action_auto_approved``
  audit row.
* New ``ProbeEnvelopeProposal.blast_radius_level`` field
  (``"low" | "medium" | "high"``); default ``"high"`` preserves
  the existing operator-must-review posture. Backwards-compat:
  legacy on-disk payloads without the field rehydrate to
  ``"high"`` via the new ``proposal_from_dict`` helper.
* Heuristic in ``_derive_blast_radius_level`` returns ``"low"``
  ONLY for known-narrow envelope patterns
  (``_KNOWN_LOW_RISK_PATTERNS`` — currently the fly
  restart_unhealthy_machine envelope's (probe, issue_category)
  pairs). Everything else defaults to ``"high"``. The heuristic
  intentionally undershoots — false-low classifications would
  let proposals slip through to operator's envelope without
  review.
* CRITICAL — two-tier gating preserved (documented inline +
  in the auto_approve module docstring):
  1. Auto-approve → "this is in our envelope vocabulary"
  2. Per-probe ``KORA_PROBE_AUTOFIX_<NAME>_ENABLED`` →
     "Kora is permitted to invoke it at runtime"
  The auto-approve flag DOES NOT cause Kora to execute the
  fix — only adds it to the vocabulary; operator still enables
  the per-probe ENABLED env separately to authorize execution.
* New audit seam ``promotion.probe_envelope_action_auto_approved``
  extending SeamName Literal with the auto_approve_wait_hours +
  auto_approved_at fields for operator timeline reconstruction.

Tests:
* 12 new tests for the probe-fix-envelope auto-approve sweep
  (heuristic / fixture-backed rehydration / sweep disabled by
  default / wait-window enforcement / high-risk never approves /
  audit payload shape).
* 13 new tests for ``kora promote`` CLI commands (status / pending
  / history / run-once dispatch / loop allowlist / snapshot_expand
  special-case error / cross-loop history caller_session_id
  filtering).
* 2 new tests for alert fallback DM polish (wording assertion +
  dm_status enum assertion).
* All 41 new tests pass; all relevant existing tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rafe-walker added a commit that referenced this pull request May 24, 2026
…n pip POC + structural FATAL flag (#204)

Two deliverables closing #203's §6.4 gap (no pip-install path validated) + #200's deferred follow-up (FATAL contract was documentation-only). Per Joshua's amended feedback-local-first-upstream-after: structure only, no PyPI publish, no upstream PR.

Deliverable A — Pip-packaging foundation

plugins/marvin/ restructured to relocatable src/ layout:

  plugins/marvin/
  ├── pyproject.toml          # NEW — setuptools build + hermes_agent.plugins entry point
  ├── README.md               # NEW — operator-facing install doc
  ├── plugin.yaml             # KEPT — Hermes bundled-plugin discovery
  ├── __init__.py             # REPLACED — sys.path-adjusting compat shim
  └── src/marvin/
      ├── __init__.py         # canonical module (relocated)
      └── data/
          ├── MARVIN.md       # relocated
          └── marvin_system_prompt.md

pyproject.toml declares "[project.entry-points.hermes_agent.plugins] marvin = marvin:register" — the exact entry-point group Hermes's _scan_entry_points already reads. Package_data includes the markdown files so the wheel ships relocatable identity assets.

The in-tree compat shim at plugins/marvin/__init__.py prepends plugins/marvin/src to sys.path + re-exports register/marvin_identity_provider from the canonical module. Existing 11 multi-tenant tests from #203 continue to pass with only one updated assertion (data files moved to src/marvin/data/).

Live dry-run install validated this session:
  $ uv build --wheel → marvin_runtime-0.1.0a1-py3-none-any.whl
  $ pip install <wheel> into fresh /tmp/marvin-dry-install venv
  $ importlib.metadata.entry_points() discovers ('marvin', 'marvin:register')
  $ marvin.__file__ = /tmp/.../site-packages/marvin/__init__.py (NOT in-tree)
  $ provider call → IdentitySpec(agent_name='Marvin', soul_chars=1306, system_chars=1486)
  $ register(stub_ctx) → wires identity provider via stub PluginContext ✓

CI-runnable regression guard: tests/plugins/test_marvin_pip_install_dry_run.py — 6 tests covering pyproject structure pins, wheel content pins, entry-point declaration pin, METADATA Requires-Python pin. ~0.5s per run via uv build (or python -m build if uv unavailable).

Companion kora-docs deliverables (separate PR):
  - kora_docs/14_research/kora_pip_packaging_2026-05-24/AUDIT.md — concrete shopping list for the future 4-package Kora restructure (kora-runtime + kora-cli + kora-cockpit + kora-promote-loops). 7-9 CC-days estimate across 4 sequential phases. 4 open questions for operator (Hermes pip-installability, IsoKron client packaging, schema migrations, versioning cadence).
  - kora_docs/14_research/plugin_identity_option_c_2026-05-24/MARVIN_DEMO_TRANSCRIPT.md §7 addendum — gap §6.4 closed; full wheel build + dry-run install transcript captured.

Deliverable B — Structural FATAL flag

Replaces #200's documentation-driven FATAL contract with structural enforcement:

agent/background_daemon_registry.py:
  - New field BackgroundDaemonEntry.fatal_on_startup_failure: bool = False
  - Field docstring documents the semantics + the looked-up-by-name path for the Path B thin-shim shape

kora_cli/plugins.py:
  - PluginContext.register_background_daemon accepts the new fatal_on_startup_failure kwarg + threads it into the BackgroundDaemonEntry construction

kora_cli/daemon.py:
  - New method DaemonCoordinator._is_startup_failure_fatal(listener_name):
    - Looks up the listener's BackgroundDaemonEntry by name
    - Returns entry.fatal_on_startup_failure if found
    - Returns True if not found (preserves pre-flag behavior for Kora-only HTTP service mounts: web/mcp/webhooks)
    - Returns True on lookup failure (defensive — never silently degrade on infrastructure error)
  - run() loop checks the flag at the listener-startup-raise site:
    - True → abort daemon boot (existing behavior)
    - False → log + continue starting subsequent listeners (NEW: lenient default for non-critical daemons whose own try/except didn't catch an unexpected exception)

kora_cli/listeners/reasoning_engine_listener.py:
  - _hermes_entry now passes fatal_on_startup_failure=True explicitly
  - Module docstring updated to reflect structural-not-documentation contract

Tests: 18 in tests/kora_cli/test_daemon_fatal_on_startup_failure.py covering:
  - Dataclass field defaults + frozen invariant
  - PluginContext kwarg passthrough
  - Lookup behavior (fatal entry / non-fatal entry / no entry / unknown name)
  - End-to-end coordinator behavior (fatal raises abort; non-fatal raises log + continue past)
  - Production pin: reasoning_engine has fatal=True; 7 phase-1/2/3 listeners (snapshot/heartbeat_probes/slack_client/purelymail_client/alert_notifier/cost_telemetry/mcp_consumption) default to fatal=False

Per the [[feedback-local-first-upstream-after]] amendment: this Hermes extension lives in the fork only this dispatch. When operator approves the next upstream-PR batch, this becomes upstream candidate #4 (joining the 3 already-deferred branches from #196).

Tests: 473/473 focused regression set green (Marvin tests + pip dry-run tests + FATAL flag tests + listener tests + identity tests + plugin tests).

Co-authored-by: CC#3 Kora Runtime <kora-pm@stormhavenenterprises.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant