Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

feat(kora): KR-ALERT-NOTIFY ST1 — push alerts to Joshua via Slack DM + email#149

Merged
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-ALERT-NOTIFY-ST1
May 23, 2026
Merged

feat(kora): KR-ALERT-NOTIFY ST1 — push alerts to Joshua via Slack DM + email#149
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-ALERT-NOTIFY-ST1

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

Summary

Closes the operator-feedback loop. Cockpit-complete observation from #145: every panel reads real data, but Joshua still has to LOOK at the cockpit. With this bucket, when a critical/warning alert fires Kora pings Joshua via Slack DM; info alerts go to email.

Bucket spec: `17_cc_bucket_prompts/KR-ALERT-NOTIFY_push_to_slack_email.md`

Source PRs cited:

§4 PM-open status — ST1 defaults applied

Q Default Status
Q1 cadence 180s (3 min) accepted wired with `KORA_ALERT_NOTIFY_INTERVAL_SEC` env override
Q2 critical/warning → Slack, info → email accepted per-severity routing in `_SEVERITY_TO_CHANNEL`
Q3 fire on first cycle (no persistence) accepted empty `last_alert_ids` at construction; operator gets re-pinged on restart
Q4 `kora__send_test_alert` MCP tool DEFERRED to ST2 flagged in commit message; spec puts it in ST2

Surface

Layer LOC
`kora_cli/alerts/notifier.py` (NEW) 360 — AlertNotifier + DispatchOutcome + NotificationCycleResult + 4 pure formatting helpers
`kora_cli/alerts/init.py` re-exports
`kora_cli/listeners/alert_notifier_listener.py` (NEW) 215 — daemon listener + periodic-task registration + 2 lazy client factories
`kora_cli/listeners/init.py` wire-in after the email inbound listener
`kora_cli/audit/jsonl_sink.py` +1 `SeamName` literal: `"notification.dispatched"`
Tests 50 new (36 notifier + 14 listener)

Failure semantics — fail-soft + no spam

Each dispatch attempt is independent. If SlackClient is unavailable / SMTP rejects / Joshua envs are unset:

  1. The alert ID still enters `last_alert_ids` so the next cycle doesn't re-dispatch (prevents spam during transient outages).
  2. The audit JSONL records `notification.dispatched` with `status="failed"` + `error=""` so operator can triage via the audit panel.
  3. The alert remains visible in the cockpit — notifications are a push convenience, not a delivery guarantee.

This trades a single missed ping (transient Slack 429) for clean operator UX. ST2's per-category cooldown + the runbook addendum will cover the gap with a different mechanism.

Audit seam extension

Adds `notification.dispatched` to the `SeamName` Literal in `kora_cli/audit/jsonl_sink.py`. Small + additive — existing seams unchanged. Details schema:
```
{
channel: "slack_dm" | "email",
alert_id: ,
severity: "critical" | "warning" | "info",
category: ,
status: "ok" | "failed",
error?: <exception type name when failed; omitted on ok>
}
```

Test plan

  • 50 new tests pass (36 notifier + 14 listener)
  • 600/600 cross-bucket regression when run serially (`-o addopts=""`)
  • Ruff clean

Pre-existing xdist flake (NOT introduced by this PR)

When run under pytest-xdist with the full `tests/kora_cli/alerts/ + test_listeners/ + handlers/ + audit/ + clients/` suite, ~4-6 tests in `test_email_inbound_handler.py` flake intermittently (`assert result.status == HANDLED_RECEIVED` failing with `'filtered_paused'`). Verified pre-existing on bare HEAD without these changes — ran 3 times on `git stash`d HEAD: 4 failed / 0 failed / 6 failed. Filed as separate concern for a follow-on xdist-ordering / state-pollution bucket; not blocking this PR.

Cascade

ST2: per-category cooldown + burst dampening + daily-digest mode + operator runbook addendum + `kora__send_test_alert` MCP tool (Q4).

After ST2 merges, the operator-feedback loop closes fully — Joshua doesn't have to look at the cockpit; Kora pings him when something matters.

🤖 Generated with Claude Code

…+ email

Closes the operator-feedback loop. The cockpit-complete observation
from #145: every panel now reads real data, but Joshua still has
to LOOK at the cockpit to see alerts. This bucket flips that — when
a critical/warning alert fires, Kora pings Joshua via Slack DM
(immediate); info alerts go to email.

New module: ``kora_cli/alerts/notifier.py``

  * ``AlertNotifier`` class — periodic-task entry point
    ``run_notification_cycle`` calls #145's aggregator, computes
    set-diff against last cycle's alert IDs, dispatches only
    newly-firing alerts.
  * Channel routing (PM §4 Q2 default): critical/warning → Slack
    DM, info → email.
  * Dedup state in-memory only (PM §4 Q3 default): empty on first
    cycle → all currently-active alerts fire as "new" (one-time
    re-ping on restart; persistence is a future bucket).
  * Failed dispatch still adds alert ID to last_alert_ids (no
    spam on transient SMTP/Slack errors).
  * Audit emit ``notification.dispatched`` per dispatch attempt
    (success OR failure) — operator triages via audit panel if
    expecting a notification that didn't arrive.
  * Pure formatting helpers (Slack DM text, email subject, email
    body, relative-time) extracted for isolated unit-testing.

New listener: ``kora_cli/listeners/alert_notifier_listener.py``

  * Constructs the notifier bound to ``current_slack_client`` +
    ``current_purelymail_client`` lazy factories (matches the
    accessor pattern from KR-MCP-SEND-TOOLS).
  * Periodic task ``alerts.notify`` registered at import time @
    180s default (PM §4 Q1); ``KORA_ALERT_NOTIFY_INTERVAL_SEC``
    env override.
  * Shutdown resets dedup state so a subsequent listener start
    sees a clean slate.
  * Defense-in-depth outer try/except in run_notification_cycle
    so any path that bypasses the notifier's inner catch can't
    crash the heartbeat scheduler.

Audit seam extension: ``notification.dispatched`` added to the
SeamName Literal in ``kora_cli/audit/jsonl_sink.py`` (small +
additive; existing seams unchanged).

K-DG verification at HEAD ``2345d51`` (matches the spec's cited
SHA): all 5 accessors confirmed in place + interfaces matched
(compute_active_alerts ✓ in #145 / current_slack_client /
current_purelymail_client / register_periodic_task ✓ in heartbeat
scheduler / emit_audit ✓ in audit/jsonl_sink).

50 new tests pass:
  * 36 notifier tests (formatting + routing + dedup + failure
    isolation + audit shape + telemetry + dispatch outcome)
  * 14 listener tests (registration + cadence resolution +
    lifecycle + run_notification_cycle short-circuits +
    defense-in-depth + factory wiring)

Cross-bucket regression: 600/600 when run serially. With
pytest-xdist parallelism, 5-ish flaky failures appear in
test_email_inbound_handler.py — VERIFIED pre-existing on bare
HEAD without these changes (3 runs of bare HEAD reproduced 4/6/0
failures, same email_inbound tests). Not introduced by this PR;
filed as a separate xdist-ordering pollution issue for the
follow-on bucket queue.

Ruff clean.

§4 PM-open status — all DEFAULTS applied in ST1:
  Q1 cadence 180s (3 min)               — accepted
  Q2 routing critical/warning→Slack, info→email — accepted
  Q3 fire on first cycle (no persistence) — accepted
  Q4 kora__send_test_alert MCP tool      — DEFERRED to ST2

After ST2 lands (per-category cooldown + burst dampening + digest
mode + operator runbook + kora__send_test_alert MCP tool), the
operator-feedback loop closes fully.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rafe-walker rafe-walker merged commit dafe84f into feature/phase2-upgrades May 23, 2026
@rafe-walker rafe-walker deleted the feat/kora-KR-ALERT-NOTIFY-ST1 branch May 23, 2026 21:03
rafe-walker added a commit that referenced this pull request May 24, 2026
…or (#166)

Closes the unified-operator-interface loop. Tails audit JSONL for probe.wake_requested events (PR #163 emits); per (probe, issue_category) inline debounce; invokes engine.respond() with structured probe context (issue + recent observations + envelope status); DMs operator via existing client.post_dm path.

Activates route='probe_investigation' telemetry literal (PR #161 reserved). Engine reads message.source to derive route through existing record_inference site — no telemetry-side changes needed.

Env vars added: KORA_PROBE_DEBOUNCE_SECONDS=600 (10 min default; 0 disables), KORA_PROBE_DEBOUNCE_BYPASS_CRITICAL=false (fail-closed; opt-in even for critical), KORA_PROBE_WAKE_POLL_SEC=30 (listener tail cadence). KORA_SLACK_JOSHUA_USER_ID reused from PR #149.

All 4 STOP-ASK conditions resolved inline:
- MessageSource Literal extended (1-line) with 'probe_investigation' + _derive_caller_session_id returns 'probe:{probe}:{category}' for future panel xref
- Listener-coordinator wire uniform across 9 listeners (register_daemon_listener pattern)
- Operator channel canonicalized at KORA_SLACK_JOSHUA_USER_ID (PR #149 precedent)
- Tail-position stamping at first-tick (don't replay history at boot) — inverse of AlertNotifier's set-diff semantic; documented

Wake-to-DM latency ~30s worst case (poll cadence), tunable to 5s. 42 new tests + 634/634 cross-bucket regression + ruff clean.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant