This repository was archived by the owner on May 26, 2026. It is now read-only.
KR-PROBE-AUTOFIX-EXECUTION — Kora attempts the fix (vision completion)#182
Merged
rafe-walker merged 1 commit intoMay 24, 2026
Merged
Conversation
…n completion) Completes the unified-operator-interface vision per ``feedback-kora-is-unified-operator-interface``: "Kora investigates + attempts fix where safe + DMs you with what happened, what was tried, what's left for you to decide." PR #166 shipped investigate+DM; this bucket ships the attempt-fix layer. New module ---------- `kora_cli/tools/probe_autofix.py`: * `attempt_probe_autofix(probe, action, target_id, reason)` — top-level orchestrator. Always returns a structured dict; never raises. * Validation pipeline runs BEFORE any API call: probe in known universe → envelope env gate re-checked → action in envelope whitelist → target_id sanity. Each rejection emits one audit row with `rejection_reason` + `rejection_detail`. * Fly executor `_execute_fly_restart_machine` — searches configured Fly apps (prod + optional staging) for target_id, refuses if not found or already started, POSTs to Fly Machines API restart endpoint, re-fetches state for after_state. Uses the same FLY_API_TOKEN + httpx pattern as `heartbeat_probes/fly.py`. MCP wiring ---------- `kora_cli/listeners/mcp_tools.py`: * New ST2 tool `kora__attempt_probe_autofix` descriptor + `_dispatch_attempt_probe_autofix`. `requires_cap_gate: True` for external MCP callers (default-deny via mcp_callers.yaml when one is configured; Kora's reasoning loop bypasses via REASONING_TOOL_ALLOWLIST). Reasoning-loop scope expansion #2 --------------------------------- `kora_cli/reasoning/tool_registry.py`: * Added `kora__attempt_probe_autofix` to `REASONING_TOOL_ALLOWLIST` + `_REASONING_MUTATING_TOOLS`. This is the 2nd mutating tool in the allowlist (1st was send_email_to_operator from #179). Module docstring extended with the "Deliberate scope expansion #2" section documenting why the blast-radius concern is bounded: per-probe env gate (default OFF, fail-CLOSED) + envelope action whitelist + per-probe executor target verification. Loop-risk bounded by probe-wake cadence + fail-CLOSED env default. Audit ----- New seam `tool.probe_autofix_attempted` (next to `tool.email_to_operator_sent`). One entry per invocation — attempts AND rejections AND execution failures. Reason field recorded verbatim (operator triage of "what did Kora decide and why"). Before/after state captured on attempts. System prompt ------------- `kora_docs/00_canonical_current_state/kora_system_prompt.md`: * Tool surface section gains the new tool with usage guidance ("ALWAYS include the outcome in your DM"). * Mutation-boundary section now lists two exceptions instead of one. Ship-enabled envelopes ---------------------- Zero envelopes are ship-enabled. The fly `restart_unhealthy_machine` envelope is whitelist-defined but `KORA_PROBE_AUTOFIX_FLY_ENABLED` defaults OFF per fail-CLOSED. Operator opts in explicitly per probe; the executor re-checks the env on every call so flipping the env mid-investigation closes the gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
rafe-walker
added a commit
that referenced
this pull request
May 24, 2026
…esBanner gaps (#184) All 4 audit streams share caller_session_id for the joinable probe investigation timeline: 1. probe.wake_requested (#163) — probe runner emits 2. tool.probe_autofix_attempted (#182) — during investigation 3. probe.investigation_completed (NEW) — model/tokens/cost/summary/dm_status/autofix_attempted 4. slack_dm_log.jsonl entry (NEW path) — wake_consumer DM routes via extracted free function append_outbound_log_entry Key design calls: - _append_outbound_log_entry extracted to free function; handler instance method delegates. Byte-identical JSONL rows from both call sites. - Cost: estimate_usage_cost over telemetry snapshot (same calc as record_inference) — keeps audit-sum-by-day in lockstep with cost-ladder rung. Snapshot approach was racy under concurrent investigations. - dm_status enum combined to 4 values (sent / failed_send / engine_unavailable_fallback / engine_unavailable_failed_send) for single-pass chip-filter. Follow-on flagged: KR-FE-PROBE-INVESTIGATION-VIEWER-V2 (already covered by CC#2's in-flight panel-kit megabucket — Deliverable D will auto-pick up probe.investigation_completed once added). 37 wake_consumer tests (28 existing + 9 new) + 401 cross-bucket regression + ruff clean.
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Completes the unified-operator-interface vision per
feedback-kora-is-unified-operator-interface: "Kora investigates + attempts fix where safe + DMs you with what happened, what was tried, what's left for you to decide." PR #166 shipped investigate+DM; this bucket ships the attempt-fix layer. After this lands, when fly probe fires unhealthy + envelope is enabled, Kora's investigation can include "I triedrestart_machineoni-abc; before=stopped, after=started; here's what's left" instead of just "I recommend you rotate the token."Bucket spec:
17_cc_bucket_prompts/KR-PROBE-AUTOFIX-EXECUTION_kora_attempts_the_fix.md.Ship-enabled envelope actions
Zero envelopes are ship-enabled. The fly
restart_unhealthy_machineenvelope is whitelist-defined inkora_cli/probes/fix_envelopes.py(PR #163) butKORA_PROBE_AUTOFIX_FLY_ENABLEDdefaults OFF per fail-CLOSED. Operator opts in explicitly per probe; the executor re-checks the env on every call, so flipping the env mid-investigation closes the gate.fix_nameflyrestart_unhealthy_machineKORA_PROBE_AUTOFIX_FLY_ENABLED=true:restart_machine(alias for the canonical name) on a Fly machine that is not in state"started"supabase/vercel/sentry/doppler(none)K-DG findings (post-#179 baseline)
heartbeat_probes/fly.pyuseshttps://api.machines.dev+KORA_FLY_API_TOKENbearer. New executor reuses the exact same pattern + adds thePOST /v1/apps/{app}/machines/{id}/restartcall.fix_envelopes.pyalready declares the fly envelope + per-probe enable env +is_envelope_enabled()accessor.wake_consumer.pyalready passesenvelope_enabled+envelope_fix_nameto reasoning context (PR feat(kora): KR-PROBE-WAKE-CONSUMER — wake event → reasoning → DM operator #166)._REASONING_MUTATING_TOOLS.requires_cap_gate: Truefor external callers (default-deny). No substrate coord needed.Defense-in-depth: validation pipeline
Each invocation runs through 5 gates before any Fly API call:
probein known universe (supabase / fly / vercel / sentry / doppler)is_envelope_enabled(probe)— re-reads the env (canonical truth)actionmatches the envelope's executor whitelist (onlyfly + restart_machine|restart_unhealthy_machinein v1)target_idnon-empty + matches^[A-Za-z0-9_-]{1,64}$"started"Each rejection emits one audit row with
rejection_reason+rejection_detail.Sample audit entry — successful attempt
{ "emitted_at": "2026-05-23T23:51:04.221000+00:00", "seam": "tool.probe_autofix_attempted", "details": { "probe": "fly", "action": "restart_machine", "target_id": "1781e9f6c12d83", "reason_from_reasoning": "Machine state has been stopped for 3 consecutive probe cycles (15 min); restart is the documented recovery per envelope.", "status": "attempted", "action_taken": "restart_machine", "action_canonical": "restart_unhealthy_machine", "fly_app": "kora-runtime", "before_state": { "id": "1781e9f6c12d83", "name": "kora-prod-east-1", "state": "stopped", "region": "iad", "instance_id": "01HZ..." }, "after_state": { "id": "1781e9f6c12d83", "name": "kora-prod-east-1", "state": "started", "region": "iad", "instance_id": "01HZ..." }, "executor_duration_ms": 842 }, "caller_session_id": "mcp:kora_reasoning_self", "source": "reasoning" }Sample audit entry — envelope disabled rejection:
{ "seam": "tool.probe_autofix_attempted", "details": { "probe": "fly", "action": "restart_machine", "target_id": "1781e9f6c12d83", "reason_from_reasoning": "machine flap detected", "status": "rejected", "rejection_reason": "envelope_disabled", "rejection_detail": { "enable_env": "KORA_PROBE_AUTOFIX_FLY_ENABLED", "fix_name": "restart_unhealthy_machine" } }, "caller_session_id": "mcp:kora_reasoning_self", "source": "reasoning" }Sample DM showing the attempt outcome
After Kora's reasoning loop completes investigation + tool invocation, the wake_consumer (PR #166) sends the operator DM. With this bucket the model now has the autofix result to weave in:
If the envelope is disabled, Kora's investigation includes:
Env vars added
KORA_PROBE_AUTOFIX_FLY_ENABLEDrestart_unhealthy_machineenvelope. Truthy values:true/1/yes/on. Anything else (including unset,false, garbage) keeps the envelope OFF per fail-CLOSED.KORA_PROBE_AUTOFIX_SUPABASE_ENABLEDKORA_PROBE_AUTOFIX_VERCEL_ENABLEDKORA_PROBE_AUTOFIX_SENTRY_ENABLEDKORA_PROBE_AUTOFIX_DOPPLER_ENABLEDKORA_FLY_API_TOKENheartbeat_probes/fly.py.KORA_FLY_STAGING_APP_NAMEFiles
kora_cli/tools/probe_autofix.py(~510 lines) — validation pipeline + Fly executor + audit emissiontests/kora_cli/tools/test_probe_autofix.py(24 tests)kora_cli/listeners/mcp_tools.py—ATTEMPT_PROBE_AUTOFIX_TOOLdescriptor +_dispatch_attempt_probe_autofix+ added toST2_TOOL_DESCRIPTORS+ST2_TOOL_DISPATCHkora_cli/reasoning/tool_registry.py— added toREASONING_TOOL_ALLOWLIST+_REASONING_MUTATING_TOOLS; "Deliberate scope expansion KR-1 ST2: Identity swap (DEFAULT_AGENT_IDENTITY + SOUL.md scaffold + repo metadata) #2" docstringkora_cli/audit/jsonl_sink.py—tool.probe_autofix_attemptedSeamName Literal entrykora_docs/00_canonical_current_state/kora_system_prompt.md— tool usage guidance + mutation-boundary "two exceptions"tests/kora_cli/reasoning/test_anthropic_engine_tool_use.py— bump 6→7 tool count assertionsTest plan
ruff checkclean on all changed files🤖 Generated with Claude Code