Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

feat(kora): KR-D-DAEMON ST3 — Slack + email webhook routers + R2 §5 amendment#104

Merged
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-D-DAEMON-ST3
May 22, 2026
Merged

feat(kora): KR-D-DAEMON ST3 — Slack + email webhook routers + R2 §5 amendment#104
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-D-DAEMON-ST3

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

Summary

Third + final ST of the always-alive runtime bucket. Adds the public-port webhook plane: a SECOND uvicorn instance inside the same daemon process binding 0.0.0.0:9118, serving a SEPARATE FastAPI() instance that mounts ONLY the two webhook routes + a healthcheck. Admin / MCP / control-plane routes are structurally impossible to surface on the public port — they live on a different app object.

Per PM Q1 ruling: "(D) with two-uvicorn refinement." Per PM rate-limit choice: slowapi (no Redis dependency).

Bucket spec: `kora_docs/17_cc_bucket_prompts/KR-D-DAEMON_always_alive_runtime.md` (commit 54032c6).

Base: `feature/phase2-upgrades` — NOT main.

New modules

  • `kora_cli/listeners/webhook_signing.py` — pure HMAC verifiers (Slack v0 scheme with 5-min replay protection; Purelymail HMAC-SHA256 over body with optional `sha256=` prefix). Returns `VerificationOutcome(ok, reason)` with stable machine-readable codes.
  • `kora_cli/listeners/webhook_dead_letter.py` — structured-log dead-letter recorder. Header allow-list + signature truncation + zero body-content logging.
  • `kora_cli/listeners/webhooks.py` — second uvicorn binding 0.0.0.0:9118; public FastAPI app via `_build_webhook_app()` factory; Slack + email routes with HMAC verify + dead-letter; slowapi rate limiter (60/min/IP default, `KORA_WEBHOOK_RATE_LIMIT` override); `/healthz` no-auth for Fly HTTP check.

fly.toml

  • New SECOND `[[services]]` block for `internal_port = 9118` with PUBLIC `services.ports`: 443 (TLS) + 80 (force_https) + http_check on `/healthz`.
  • Existing `internal_port = 9119` block unchanged.

R2 §5 amendment

`kora_docs/00_canonical_current_state/r2_amendments.md` — frames original "no public ports" as control-plane-only; documents the four-pillar threat-model rationale for the webhook-plane exception (HMAC at request boundary, signing-secret is the security control, one-way ingress data flow, structural blast-radius isolation). Operator obligations pinned. Date 2026-05-22.

Rate-limiter choice: slowapi (citation)

Chose slowapi over fastapi-limiter:

  1. No Redis dependency — fastapi-limiter requires a Redis backend; the daemon is a single-machine process and in-memory token-bucket is sufficient for ingress flood detection.
  2. Lighter operational surface — one less moving part to provision, secure, monitor, fail.
  3. FastAPI-native ergonomics — `@limiter.limit("60/minute")` decorator directly on routes; SlowAPIMiddleware handles the per-request bucket.
  4. Sufficient for the threat model — the rate limit is a flood backstop, not a precise QPS control; in-memory is fine.

Trade-off: rate-limit state resets on daemon restart. Acceptable since daemon restarts are infrequent + a flood that survives a restart is detectable from sustained dead-letter rates anyway.

Tests (37 new, 81 total all passing)

  • `test_webhook_signing.py` (14 tests) — Slack accept (valid + URL-handshake), reject (8 paths: secret unset / sig missing / ts missing / ts malformed / ts too old / ts too new / bad scheme / mismatch / wrong-secret); Purelymail accept (prefixed + bare) + reject (3 paths).
  • `test_webhook_dead_letter.py` (6 tests) — allow-list filtering, signature truncation, header lowercasing, emit format, optional fields, body-content leakage absence proof (asserts no hypothetical body data in any log line).
  • `test_webhooks.py` (17 tests) — URL-verify round-trip, valid event accept, signature mismatch 401 + dead-letter logged, timestamp too old 408, missing signature 401, email valid accept, email mismatch 401 + dead-letter, email secret unset 401, /healthz no-auth, ADMIN ROUTES STRUCTURALLY ABSENT ON PUBLIC APP (4 paths verified 404), rate-limit enforcement at 2/min (3rd req → 429), factory shape + env overrides, full uvicorn lifecycle end-to-end, default rate-limit constant.

End-to-end smoke (live daemon)

```
KORA_DEV=1 KORA_MCP_BEARER_TOKEN=mcp-tok \
KORA_SLACK_SIGNING_SECRET=slack-secret \
KORA_PUREMAIL_HMAC_SECRET=email-secret \
KORA_WEB_PORT=9289 KORA_WEBHOOK_PORT=9288 \
kora daemon
```

  • All 4 listeners boot in order (heartbeat → web → mcp → webhooks)
  • `GET /api/status` on 9289 → 200 ✓
  • `GET /healthz` on 9288 → 200 "ok" ✓
  • Slack URL-verification on 9288 → 200 echoes challenge ✓
  • `GET /api/status` on PUBLIC 9288 → 404 (admin not mounted) ✓
  • `GET /mcp/tools/list` on PUBLIC 9288 → 404 (MCP not mounted) ✓
  • SIGTERM → clean exit 0

Substrate-side gap (flagged for follow-on)

The bucket asked dead-letter to write `kora_operation_ledger` rows with `op_kind='webhook_dead_letter'`. Ledger schema (substrate migration 0093) requires `work_attempt_id` FK NOT NULL + `workspace_id` + `ticket_id` + `tool_name` — all tied to Sea_Ticket dispatch. Webhook dead-letter has none.

Today: structured-log surface (`logger.warning("[kora.webhook.dead_letter] ...")`) with stable log-key + header allow-list + signature truncation; never logs body.

Future: substrate-team adds (a) `webhook_dead_letters` table, (b) `kora.webhook.dead_letter` chain-event vocab literal, or (c) permissive ledger shape. The runtime extension is small — log-line emit is the stable seam. Flagged in PR + R2 amendment doc.

Purelymail signing scheme flag

`verify_purelymail_signature` ships the conservative default (HMAC-SHA256 over body, header `X-Purelymail-Signature` accepting both `sha256=` and bare hex). Purelymail's actual scheme is not well-documented publicly — operator MUST verify at integration time + update if it diverges. Pinned in R2 amendment + operator-obligations section.

§6 ship checklist

  • Base `feature/phase2-upgrades`
  • Title format `feat(kora): KR-D-DAEMON STn — `
  • All §5 PM-opens resolved
  • Tests pass locally (81/81)
  • slowapi added to `web` extra (justification cited above)
  • R2 amendment doc authored with threat-model rationale + date
  • No K-DG drift surfaced

What's next

KR-D-DEPLOY follow-on (PM's next dispatch): sequence the actual rollout — Doppler env mapping (`KORA_MCP_BEARER_TOKEN` → `kora-runtime-substrate`; `KORA_SLACK_SIGNING_SECRET` + `KORA_PUREMAIL_HMAC_SECRET` → `kora-runtime-gateways`), Dockerfile entrypoint flip (`kora daemon` vs the current `hermes ...` shape), validate the second uvicorn binds 0.0.0.0:9118 inside Fly's network namespace.

🤖 Generated with Claude Code

…mendment

Third + final ST of the always-alive runtime. Adds the public-port
webhook plane: a SECOND uvicorn instance inside the same daemon
process binding 0.0.0.0:9118, serving a SEPARATE FastAPI() instance
that mounts ONLY the two webhook routes + a healthcheck. Admin /
MCP / control-plane routes are structurally impossible to surface
on the public port — they live on a different app object.

Per PM Q1 ruling: "(D) with two-uvicorn refinement."

## New modules

- **`kora_cli/listeners/webhook_signing.py`** — pure HMAC verifiers:
  - `verify_slack_signature` — Slack v0 scheme (`v0:<ts>:<body>`
    base string + HMAC-SHA256), 5-min timestamp tolerance for
    replay protection (skew protection in BOTH directions).
  - `verify_purelymail_signature` — conservative default
    (HMAC-SHA256 over body, header `X-Purelymail-Signature` with
    optional `sha256=` prefix); flagged in PR body + R2 amendment
    for operator-side verification at Purelymail integration time.
  - Both return `VerificationOutcome(ok, reason)` with stable
    machine-readable codes for dead-letter records.

- **`kora_cli/listeners/webhook_dead_letter.py`** — structured-log
  recorder for verification failures. Header allow-list (truncates
  signature values to 12-char prefix; excludes Authorization /
  Cookie / etc.); NEVER logs body content. Stable
  `[kora.webhook.dead_letter]` prefix for operator log analysis.
  PR body flags why this is structured-log-only today vs. the
  bucket's `kora_operation_ledger` ask — see "Substrate-side gap"
  below.

- **`kora_cli/listeners/webhooks.py`** — the SECOND uvicorn:
  - `_build_webhook_app()` factory mints a fresh public FastAPI app
    per startup (no slowapi state bleed across daemon restarts).
  - Routes: `POST /api/webhooks/slack/events`,
    `POST /api/webhooks/email/inbound`, `GET /healthz`.
  - slowapi rate limiter — chosen over fastapi-limiter (NO Redis
    dependency; single-machine daemon; in-memory token bucket is
    sufficient). Default 60 req/min/IP; `KORA_WEBHOOK_RATE_LIMIT`
    env override.
  - Slack handler: HMAC verify → 401 (signature) / 408 (timestamp);
    URL-verification handshake echoed inline; valid events get
    placeholder `{"ok": true}` (Feature 5 lands real handler).
  - Email handler: HMAC verify → 401; valid posts get placeholder
    `{"ok": true}` (Feature 3 lands real handler).
  - Dead-letter logged on every verify failure with peer IP +
    request ID + header summary.
  - `WebhookListener` class wraps uvicorn (proxy_headers=True for
    real-peer-IP behind Fly edge); programmatic Server.serve +
    should_exit lifecycle.

## fly.toml

- New SECOND `[[services]]` block for `internal_port = 9118` with
  PUBLIC `services.ports`: 443 (TLS) + 80 (force_https) + http_check
  on `/healthz`.
- Existing `internal_port = 9119` block unchanged (internal-only).

## R2 §5 amendment

- **`kora_docs/00_canonical_current_state/r2_amendments.md`** —
  authoritative scoping doc. Frames original R2 §5 "no public ports"
  as control-plane-only (admin UI / MCP / kora_control). Documents
  the four-pillar threat-model rationale for the webhook-plane
  exception (HMAC at request boundary; signing-secret is the
  security control; one-way ingress data flow; structural blast-
  radius isolation via separate FastAPI app). Operator obligations
  pinned. Date 2026-05-22.

## Tests (37 new, 81 total all passing)

- `test_webhook_signing.py` — 14 tests: Slack accept paths
  (valid + URL-handshake), reject paths (secret unset / sig missing
  / ts missing / ts malformed / ts too old / ts too new / bad
  scheme / mismatch / wrong-secret-mismatch); Purelymail accept
  (prefixed + bare) + reject (secret unset / sig missing / mismatch).
- `test_webhook_dead_letter.py` — 6 tests: allow-list filtering,
  signature truncation, header lowercasing, warning emit format,
  optional-fields handling, body-content leakage absence proof.
- `test_webhooks.py` — 17 tests: Slack URL-verify round-trip,
  valid-event accept, signature-mismatch 401 + dead-letter logged,
  timestamp-too-old 408, missing-signature 401, email valid accept,
  email mismatch 401 + dead-letter, email secret-unset 401, /healthz
  no-auth, ADMIN-ROUTES-STRUCTURALLY-ABSENT-ON-PUBLIC-APP (4 paths
  verified 404), rate-limit enforcement at 2/minute (3rd req → 429),
  factory shape + env overrides, full uvicorn lifecycle end-to-end,
  default rate-limit constant.

## End-to-end smoke (live daemon)

```
KORA_DEV=1 KORA_MCP_BEARER_TOKEN=mcp-tok \
  KORA_SLACK_SIGNING_SECRET=slack-secret \
  KORA_PUREMAIL_HMAC_SECRET=email-secret \
  KORA_WEB_PORT=9289 KORA_WEBHOOK_PORT=9288 \
  kora daemon
```

- All 4 listeners boot in order (heartbeat → web → mcp → webhooks)
- `GET /api/status` on 9289 → 200 ✓
- `GET /healthz` on 9288 → 200 "ok" ✓
- Slack URL-verification on 9288 → 200 echoes "hello-kora" ✓
- `GET /api/status` on PUBLIC 9288 → 404 (admin not mounted) ✓
- `GET /mcp/tools/list` on PUBLIC 9288 → 404 (MCP not mounted) ✓
- SIGTERM → clean exit 0

## Substrate-side gap (flagged for follow-on)

The bucket asked dead-letter to write to `kora_operation_ledger`
with `op_kind='webhook_dead_letter'`. The ledger schema (substrate
migration 0093) requires `work_attempt_id` (FK to `work_attempts`,
NOT NULL) + `workspace_id` + `ticket_id` + `tool_name` — all tied
to Sea_Ticket dispatch. A webhook dead-letter has none of those.

Today: structured-log surface (`logger.warning("[kora.webhook.dead_letter]
...")`) is the operator-visible artifact. Stable log-key + header
allow-list + signature truncation; never logs body content.

Future: substrate-team adds either (a) a `webhook_dead_letters`
table, (b) a `kora.webhook.dead_letter` chain-event vocab literal,
or (c) a permissive ledger shape. The runtime extension is a small
change in `webhook_dead_letter.py` — the log-line emit is the
stable seam.

## Purelymail signing scheme flag

`verify_purelymail_signature` ships the conservative default
(HMAC-SHA256 over body, header `X-Purelymail-Signature` accepting
both `sha256=<hex>` and bare hex). Purelymail's actual scheme is
not well-documented publicly — operator MUST verify at integration
time + update if it diverges. Pinned in R2 amendment doc + PR body.

## §6 ship checklist

- [x] Base `feature/phase2-upgrades`
- [x] Title format `feat(kora): KR-D-DAEMON STn — <scope>`
- [x] All §5 PM-opens resolved (Q1 = two-uvicorn; Q2-Q5 honored)
- [x] Tests pass locally (81/81)
- [x] slowapi added to `web` extra (cite justification in PR body)
- [x] R2 amendment doc authored with threat-model rationale + date
- [x] K-DG: no further drift surfaced

## What's next

KR-D-DEPLOY follow-on bucket: fly.toml is updated in THIS PR; deploy
bucket will sequence the actual rollout, Doppler env mapping
(`KORA_MCP_BEARER_TOKEN` to substrate; `KORA_SLACK_SIGNING_SECRET` +
`KORA_PUREMAIL_HMAC_SECRET` to gateways), Dockerfile entrypoint flip
(`kora daemon` vs the current `hermes ...` shape).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rafe-walker rafe-walker merged commit e1b5da5 into feature/phase2-upgrades May 22, 2026
@rafe-walker rafe-walker deleted the feat/kora-KR-D-DAEMON-ST3 branch May 22, 2026 05:23
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant