Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

feat(KR-P2-I-integration ST1): OperationalStateHolder singleton + transition broker#32

Merged
rafe-walker merged 1 commit into
mainfrom
feat/kora-KR-P2-I-integration-st1-holder
May 21, 2026
Merged

feat(KR-P2-I-integration ST1): OperationalStateHolder singleton + transition broker#32
rafe-walker merged 1 commit into
mainfrom
feat/kora-KR-P2-I-integration-st1-holder

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

Summary

KR-P2-I-integration ST1 of 5. Adds the live state holder that the KR-P2-I-skeleton was missing — the skeleton ships the immutable shape + the §9.1 transition table + query helpers, but nothing to actually hold the current state.

agent/operational_state_holder.py (new, 203 LOC)

  • OperationalStateHolder — holds the live OperationalState behind an asyncio.Lock so concurrent transition_to() calls serialize.
  • transition_to(new_primary_state, trigger, *, new_claim_permission, add_reasons, remove_reasons) — validates (from, to) against TRANSITION_TABLE via is_valid_transition; raises InvalidStateTransitionError on bad arrows. Same-state calls bypass the table check — R4.1 §9.1 models DEGRADED as the presence of degradation_reasons, not a primary_state edge, so (READY, READY) while adding DegradationReason.AUTH must succeed even though no (READY, READY) row exists.
  • Listeners — fire AFTER the held state is swapped and the lock is released. A listener exception is logged but does NOT roll back the transition. Listeners are observability, not policy; ST2 registers the chain-event emit as a listener.
  • Module-level init_holder() — idempotent; first call wins (mirrors the IsoKronMemoryProvider singleton pattern). get_holder() returns None before init.

Tests — tests/test_operational_state_holder.py (288 LOC)

14 cases:

  • Construction + current snapshot
  • Valid arrow updates state; invalid arrow raises with the offending pair in the message
  • Same-primary-state bypass (degradation update on READY) succeeds
  • Self-transition row (BOOTING → BOOTING, "transient gate failure") still works
  • with_* composition (claim_permission + add_reasons + remove_reasons)
  • Listener receives (old, new, trigger); holder.current is already new when the listener fires
  • Listener exception doesn't block subsequent listeners
  • Listener exception doesn't roll the state back
  • Concurrent transition_to calls serialize via the asyncio lock (verified via a sleep-in-listener race)
  • get_holder() returns None before init
  • init_holder constructs the singleton + is idempotent (first state wins)
  • _reset_holder_for_tests clears the singleton

Honest scope

  • ST1 is pure holder logic. No IsoKronMemoryProvider wiring (ST3), no emit listener (ST2), no SeaTicketPoller integration (ST4), no endpoint flip (ST5).
  • No history ring buffer yet. ST5 adds the in-memory transition history that the panel renders.
  • Singleton pattern by intent. Multi-Kora-instance scenarios are Phase 3 per spec §4.

Sub-task chain (this bucket)

ST Branch Status
ST1 …st1-holder this PR
ST2 …st2-emit next — depends on ST1
ST3 …st3-provider-wire depends on ST1 + ST2
ST4 …st4-poller-wire depends on ST1; also needs KR-P2-E ST1 merged
ST5 …st5-flip-endpoint depends on ST1

§1 verifications

  • agent/operational_state.py skeleton API surface (PrimaryState / DegradationReason / ClaimPermission enums + OperationalState + TRANSITION_TABLE + is_valid_transition / transitions_from / transitions_to) — imports verified.
  • ✅ Endpoint shape (/api/operational-state returns primary_state / claim_permission / degradation_reasons / is_degraded / transition_history / valid_next_states / stub) — read at kora_cli/web_server.py:3160. ST5 will preserve.
  • Event vocab — STOP-ASK fired before ST1, resolved with PM. Substrate ships ONE generic edge event (kora.operational_state.transitioned — see packages/db/migrations/foundation/0159_kora_r41_operational_state_event_vocabulary.sql line 277) plus per-trigger informational literals for kora.boot.ready, kora.boot.failed, kora.paused.cost_limit. ST2 will implement the always-emit-generic + conditionally-emit-per-trigger shape.

Test plan

  • CI green on pytest tests/test_operational_state_holder.py
  • No imports of this module from production code yet — ST3 lands the first real caller; if anything imports init_holder outside this PR's tests, the CC#3-side CI grep should flag it

🤖 Generated with Claude Code

…nsition broker

The skeleton at agent/operational_state.py ships the immutable
state shape, the §9.1 transition table, and query helpers — but
has no notion of "what is Kora's current state right now." This
module adds that.

agent/operational_state_holder.py:
  - OperationalStateHolder: holds the live OperationalState behind
    an asyncio.Lock so concurrent transition_to() calls serialize.
  - transition_to() validates (from, to) against TRANSITION_TABLE
    via is_valid_transition; raises InvalidStateTransitionError on
    bad arrows. Same-state calls (degradation_reason / claim_permission
    updates with primary_state unchanged) bypass the table check —
    R4.1 §9.1 models DEGRADED as a flag, not a primary_state edge.
  - Listeners fire AFTER the held state is swapped and the lock is
    released. A listener exception is logged but does not roll back
    the transition — listeners are observability, not policy. ST2
    registers the chain-event emit as a listener.
  - Module-level init_holder() is idempotent (first call wins;
    mirrors the IsoKronMemoryProvider singleton pattern).
    get_holder() returns None before init.

Tests (tests/test_operational_state_holder.py): 14 cases covering
the table-check, same-state bypass, the with_* composition
(claim_permission + add/remove reasons), listener fire ordering +
exception isolation + non-rollback, asyncio-lock serialization of
concurrent transitions, and the init_holder idempotence /
get_holder / reset-helper semantics.

This is ST1 of 5; ST2 lands the emit listener, ST3 wires
IsoKronMemoryProvider init to call init_holder, ST4 wires the
SeaTicketPoller claim/release transitions (depends on KR-P2-E ST1),
ST5 flips the /api/operational-state stub to read from get_holder().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rafe-walker rafe-walker merged commit 06e27e4 into main May 21, 2026
@rafe-walker rafe-walker deleted the feat/kora-KR-P2-I-integration-st1-holder branch May 21, 2026 20:55
rafe-walker pushed a commit that referenced this pull request May 21, 2026
…transitions

Adds the listener that writes a chain event for every transition
through the OperationalStateHolder. PM-verified vocabulary against
foundation/0159_kora_r41_operational_state_event_vocabulary.sql on
isokron-prod (2026-05-21):

  * ALWAYS emit ``kora.operational_state.transitioned`` — single
    generic edge event per substrate-team design. Payload carries
    from/to primary_state, new claim_permission, sorted
    degradation_reasons, and trigger.
  * ADDITIONALLY emit when a per-trigger literal exists:
      - (BOOTING → READY, "all §9.2 gates pass") → kora.boot.ready
      - (BOOTING → STOPPED, "invariant gate failure …") →
        kora.boot.failed
      - (any → PAUSED, trigger contains "cost 100%") →
        kora.paused.cost_limit
  * Operator-pause and substrate-pause: only the generic event.
    Per-trigger literals can land in a follow-on substrate vocab
    migration if cockpit needs the signal.

Fail-LOUD: preflight failures (missing provider / connection /
workspace_id) and substrate-side raises both surface as
OperationalStateEmitError. The state-machine listener-error
handler logs but doesn't roll back — listeners are observability,
not policy — so a broken emit is loudly logged but the state
machine keeps moving.

agent/operational_state_emit.py exposes:
  - GENERIC_TRANSITION_EVENT, BOOT_READY_EVENT, BOOT_FAILED_EVENT,
    PAUSED_COST_LIMIT_EVENT — constants matched against
    foundation/0159
  - emit_state_transition(provider, from, to, trigger) — async
    callable, fail-LOUD
  - make_emit_listener(provider) — factory returning a
    StateTransitionListener; ST3 wires this into the holder at
    provider init

Tests (tests/test_operational_state_emit.py): 16 cases covering
payload shape, per-trigger literal selection (incl. lenient
substring match for varied caller wording), the BOOTING → STOPPED
disambiguation (invariant-failure vs STOP-KORA L4/L5), fail-LOUD
preflight (None provider, missing connection, workspace_id raise
or empty), generic-plus-extra dual-emit, substrate-failure
propagation, and the make_emit_listener factory.

Builds on ST1 (#32). ST3 will land the wire-in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rafe-walker pushed a commit that referenced this pull request May 21, 2026
…tate holder

agent/operational_state_wire.py (new): wire_operational_state(provider)
is the one boot-time call that:

  1. init_holder(OperationalState(primary_state=BOOTING,
     claim_permission=NONE)) — module-level singleton, idempotent.
  2. holder.add_listener(make_emit_listener(provider)) — registers
     the chain-event listener from ST2.
  3. Triggers BOOTING → READY via the connection's submit_and_wait
     (agent_init.py is synchronous; we run the coro on the IsoKron
     dedicated IO loop and block boot until the emit listener returns).

Per spec §3 ST3: the transition is UNCONDITIONAL in v1. KR-P2-H
follow-on adds the §9.2 invariant-gate-check guard that decides
whether to transition READY or fall through to STOPPED — that
bucket is blocked on substrate-round Bucket C event vocab and
ships later.

Fail-soft posture: every error here is caught + logged with the
greppable [kora.operational_state.wire_in] tag. Operator boot
proceeds even if holder construction or emit raises — only the
observability surface is degraded. KR-P2-H will revisit which
failures should block boot.

agent/agent_init.py: after agent._memory_manager.initialize_all
succeeds, look up the IsoKron provider and call
wire_operational_state(provider). Wrapped in a defensive
try/except so a wire-in module-import failure also doesn't break
boot.

Tests (tests/test_operational_state_wire.py): 4 cases covering
the happy path (holder in READY + generic + boot.ready emits with
the canonical "all §9.2 gates pass" trigger), the no-connection
branch (holder created in BOOTING + greppable WARNING), the
submit-raises branch (wire-in stays fail-soft + WARNING), and
idempotence (second call with second listener doubles the per-
transition emit but state stays at READY).

Stacked on top of ST1 (#32) + ST2 (#33). ST4 wires SeaTicketPoller
claim/release (depends on KR-P2-E ST1 merged), ST5 flips the
/api/operational-state endpoint from stub to read get_holder().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant