Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

feat(KR-P2-I-integration ST2): chain-event emit on operational-state transitions#33

Closed
rafe-walker wants to merge 1 commit into
feat/kora-KR-P2-I-integration-st1-holderfrom
feat/kora-KR-P2-I-integration-st2-emit
Closed

feat(KR-P2-I-integration ST2): chain-event emit on operational-state transitions#33
rafe-walker wants to merge 1 commit into
feat/kora-KR-P2-I-integration-st1-holderfrom
feat/kora-KR-P2-I-integration-st2-emit

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

Summary

KR-P2-I-integration ST2 of 5. Adds the listener that writes a chain event for every transition through OperationalStateHolder. Stacked on top of #32 (ST1) — base is feat/kora-KR-P2-I-integration-st1-holder. Retarget to main after ST1 merges.

Event vocabulary (PM-verified against substrate)

Verified against packages/db/migrations/foundation/0159_kora_r41_operational_state_event_vocabulary.sql on rafe-walker/isokron (bd165eb HEAD):

  • Always emit kora.operational_state.transitioned — substrate-team's documented design: one generic edge event per transition; payload carries the from/to state, claim_permission, sorted degradation_reasons, and trigger.

  • Additionally emit a per-trigger informational literal when one exists in the vocab:

    Transition Trigger match Extra literal
    BOOTING → READY "all §9.2 gates pass" in trigger kora.boot.ready
    BOOTING → STOPPED "invariant gate failure" in trigger kora.boot.failed
    * → PAUSED "cost 100%" in trigger kora.paused.cost_limit
    Operator-pause, substrate-pause generic only (no vocab literal yet)

    BOOTING → STOPPED is intentionally disambiguated: STOP-KORA L4/L5 from BOOTING reaches the same arrow as invariant-gate-failure but is operator action, not a boot failure — kora.boot.failed only fires when the trigger string contains "invariant gate failure".

    Trigger matching is substring, so caller wording like "cost 100% — auto-pause" or "STOP-KORA L2: cost 100% triggered budget watcher" all land on kora.paused.cost_limit.

Failure mode — fail-LOUD

Mirrors agent/constitution_audit.py. emit_state_transition raises OperationalStateEmitError on:

  • provider is None
  • provider._connection is None
  • _resolve_workspace_id() raises or returns empty
  • The substrate-side kora__append_event call raises (Sea MCP unavailable, CHECK violation, chain lock failure, etc.)

The holder's listener-exception handler (ST1) catches and logs so one broken emit doesn't deadlock the state machine — but the raise + log is the observability path. No silent swallow.

agent/operational_state_emit.py (new, 273 LOC)

  • ConstantsGENERIC_TRANSITION_EVENT, BOOT_READY_EVENT, BOOT_FAILED_EVENT, PAUSED_COST_LIMIT_EVENT (named so the literal value is greppable from logs)
  • _select_extra_literal(from, to, trigger) — returns the supplementary literal or None. Pure function — no I/O.
  • _build_payload(from, to, trigger) — builds the payload dict. Pure function. Cockpit relies on stable sort for degradation_reasons so we sorted(...) rather than list(...).
  • emit_state_transition(provider, from, to, trigger) — async. Always emits generic; conditionally emits extra. Raises on any failure.
  • make_emit_listener(provider) — factory returning a StateTransitionListener bound to the provider. ST3 calls this once and holder.add_listener(...) registers it.

Tests — tests/test_operational_state_emit.py (399 LOC)

16 cases:

  • Payload — shape matches spec, post-transition values (not pre)
  • Per-trigger selection — boot.ready / boot.failed / paused.cost_limit positive paths; the operator/substrate negative paths; the BOOTING → STOPPED disambiguation; substring leniency across three trigger variants
  • Fail-LOUD preflightprovider=None, missing connection, workspace_id raise, workspace_id empty
  • Happy path — generic-only for ACTIVE-claim, generic+boot.ready for boot ready, both events carry identical payload
  • Substrate failureSea MCP unavailable raises bubble as OperationalStateEmitError with the underlying cause in .cause
  • Listener factorymake_emit_listener returns a usable listener

Tests use a stubbed IsoKron connection that runs the coro on the test's event loop and wraps in a concurrent.futures.Future so asyncio.wrap_future works — no real MCP transport in the unit tests. Integration test against a live provider lands in ST3.

§1 verifications — all green

Sub-task chain

ST PR Status
ST1 #32 open (this PR is stacked on it)
ST2 this PR open
ST3 next depends on ST2
ST4 TBD depends on ST1 + KR-P2-E ST1 merged
ST5 TBD depends on ST1

Test plan

  • CI green on pytest tests/test_operational_state_emit.py
  • No production wire-in yet — ST3 is the first real caller. If anything imports make_emit_listener outside this PR's tests, that's an early wire-in to flag

🤖 Generated with Claude Code

…transitions

Adds the listener that writes a chain event for every transition
through the OperationalStateHolder. PM-verified vocabulary against
foundation/0159_kora_r41_operational_state_event_vocabulary.sql on
isokron-prod (2026-05-21):

  * ALWAYS emit ``kora.operational_state.transitioned`` — single
    generic edge event per substrate-team design. Payload carries
    from/to primary_state, new claim_permission, sorted
    degradation_reasons, and trigger.
  * ADDITIONALLY emit when a per-trigger literal exists:
      - (BOOTING → READY, "all §9.2 gates pass") → kora.boot.ready
      - (BOOTING → STOPPED, "invariant gate failure …") →
        kora.boot.failed
      - (any → PAUSED, trigger contains "cost 100%") →
        kora.paused.cost_limit
  * Operator-pause and substrate-pause: only the generic event.
    Per-trigger literals can land in a follow-on substrate vocab
    migration if cockpit needs the signal.

Fail-LOUD: preflight failures (missing provider / connection /
workspace_id) and substrate-side raises both surface as
OperationalStateEmitError. The state-machine listener-error
handler logs but doesn't roll back — listeners are observability,
not policy — so a broken emit is loudly logged but the state
machine keeps moving.

agent/operational_state_emit.py exposes:
  - GENERIC_TRANSITION_EVENT, BOOT_READY_EVENT, BOOT_FAILED_EVENT,
    PAUSED_COST_LIMIT_EVENT — constants matched against
    foundation/0159
  - emit_state_transition(provider, from, to, trigger) — async
    callable, fail-LOUD
  - make_emit_listener(provider) — factory returning a
    StateTransitionListener; ST3 wires this into the holder at
    provider init

Tests (tests/test_operational_state_emit.py): 16 cases covering
payload shape, per-trigger literal selection (incl. lenient
substring match for varied caller wording), the BOOTING → STOPPED
disambiguation (invariant-failure vs STOP-KORA L4/L5), fail-LOUD
preflight (None provider, missing connection, workspace_id raise
or empty), generic-plus-extra dual-emit, substrate-failure
propagation, and the make_emit_listener factory.

Builds on ST1 (#32). ST3 will land the wire-in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rafe-walker rafe-walker deleted the branch feat/kora-KR-P2-I-integration-st1-holder May 21, 2026 20:55
rafe-walker added a commit that referenced this pull request May 21, 2026
…transitions (#36)

Always emit kora.operational_state.transitioned (generic edge event) + conditional per-trigger literals (kora.boot.ready / kora.boot.failed / kora.paused.cost_limit) matching substrate foundation/0159 source-of-truth vocab. Fail-LOUD via OperationalStateEmitError. 16 unit tests. Replaces auto-closed PR #33.
rafe-walker pushed a commit that referenced this pull request May 21, 2026
…tate holder

agent/operational_state_wire.py (new): wire_operational_state(provider)
is the one boot-time call that:

  1. init_holder(OperationalState(primary_state=BOOTING,
     claim_permission=NONE)) — module-level singleton, idempotent.
  2. holder.add_listener(make_emit_listener(provider)) — registers
     the chain-event listener from ST2.
  3. Triggers BOOTING → READY via the connection's submit_and_wait
     (agent_init.py is synchronous; we run the coro on the IsoKron
     dedicated IO loop and block boot until the emit listener returns).

Per spec §3 ST3: the transition is UNCONDITIONAL in v1. KR-P2-H
follow-on adds the §9.2 invariant-gate-check guard that decides
whether to transition READY or fall through to STOPPED — that
bucket is blocked on substrate-round Bucket C event vocab and
ships later.

Fail-soft posture: every error here is caught + logged with the
greppable [kora.operational_state.wire_in] tag. Operator boot
proceeds even if holder construction or emit raises — only the
observability surface is degraded. KR-P2-H will revisit which
failures should block boot.

agent/agent_init.py: after agent._memory_manager.initialize_all
succeeds, look up the IsoKron provider and call
wire_operational_state(provider). Wrapped in a defensive
try/except so a wire-in module-import failure also doesn't break
boot.

Tests (tests/test_operational_state_wire.py): 4 cases covering
the happy path (holder in READY + generic + boot.ready emits with
the canonical "all §9.2 gates pass" trigger), the no-connection
branch (holder created in BOOTING + greppable WARNING), the
submit-raises branch (wire-in stays fail-soft + WARNING), and
idempotence (second call with second listener doubles the per-
transition emit but state stays at READY).

Stacked on top of ST1 (#32) + ST2 (#33). ST4 wires SeaTicketPoller
claim/release (depends on KR-P2-E ST1 merged), ST5 flips the
/api/operational-state endpoint from stub to read get_holder().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant