Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

feat(kora): KR-FEAT-HEARTBEAT ST2 — endpoint flip + Doppler env update#118

Merged
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-FEAT-HEARTBEAT-ST2
May 22, 2026
Merged

feat(kora): KR-FEAT-HEARTBEAT ST2 — endpoint flip + Doppler env update#118
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-FEAT-HEARTBEAT-ST2

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

Summary

Closes the KR-FEAT-HEARTBEAT bucket. The
`/api/heartbeat/services` panel now reads live snapshots from
the ST1 heartbeat scheduler; FE STUB banner auto-disappears when
`stub: false`.

Bucket spec: `17_cc_bucket_prompts/KR-FEAT-HEARTBEAT_backend_service_probe.md`

§4 rulings applied verbatim

Q Resolution
Q1 cadence default 5min + distinct task names Locked in ST1; pinned by `test_mcp_and_heartbeat_tasks_are_independent`
Q2 cache_warming flag `cache_warming: true` returned when snapshot cache empty + `services=[]`; FE renders "Probes warming up..."
Q3 Doppler env-mapping doc Extended with Phase 2 Feature 2 sub-section + validation checklist + 2 new entries under Optional env vars

Surface changes

Layer Net
`kora_cli/web_server.py` +52/-65 — endpoint reads snapshot cache; two-branch (live + warming)
`web/src/lib/api.ts` +21/-8 — `HeartbeatStatus` extends with `"unknown"`; nullable `last_check_at`/`latency_ms`; new `error` field; new `cache_warming` flag
`tests/kora_cli/test_web_server_heartbeat.py` +247/-52 — flipped assertions + new cache_warming / status=unknown / nullable-fields / error-passthrough tests
`kora_docs/15_status_and_roadmap/kora_runtime_doppler_env_mapping.md` +42 — Phase 2 Feature 2 sub-section + validation extension

TS contract — additive (same pattern as KR-MCP-CONSUMPTION ST2)

```typescript
type HeartbeatStatus =
| "healthy" | "degraded" | "unhealthy"
| "unknown"; // NEW — auth-missing / probe-timeout / probe-loop-crash

interface HeartbeatService {
// ...existing...
last_check_at: string | null; // was non-nullable
latency_ms: number | null; // was non-nullable
error: string | null; // NEW — sanitized by probe layer
}

interface HeartbeatServicesResponse {
// ...existing...
cache_warming: boolean; // NEW — empty-cache warming flag
}
```

Not breaking — existing FE renderers see new fields as optional /
handle the new enum value gracefully.

Security carry-forward

Endpoint trusts the snapshot's `error` field (sanitization is
the probe layer's job — pinned by ST1 tests). Defense-in-depth
test scans the serialized response for token-value-shape
substrings (`Bearer `, `ghp_`) to catch any future accidental
injection.

Test plan

  • 14/14 `test_web_server_heartbeat.py` pass
  • 242/242 cross-bucket regression
  • Ruff clean
  • Manual smoke: warming-branch shape correct
    (`cache_warming=true, services=[], stub=false`)
  • Defense-in-depth Bearer/ghp_ scan over response payload

Cascade

Base: `feature/phase2-upgrades`. Closes KR-FEAT-HEARTBEAT bucket arc.

After this PR

Feature 2 closes. Joshua needs to set the 5 (+1 optional) probe
tokens in Doppler's kora-runtime-gateways project per the extended
env-mapping doc; the next probe cycle (≤5min after daemon start)
flips each panel card from "unknown" to live state.

🤖 Generated with Claude Code

Closes KR-FEAT-HEARTBEAT bucket. The /api/heartbeat/services panel
now reads live snapshots from the heartbeat scheduler; FE STUB
banner auto-disappears.

# Endpoint flip (kora_cli/web_server.py)

  - Reads current_service_snapshots() from
    kora_cli.heartbeat_probes.
  - Two-branch shape:
    * Live path:    stub=False, cache_warming=False, services
                    projected from snapshot cache (canonical order:
                    vercel → sentry → doppler → supabase → fly;
                    any extras alphabetically appended).
    * Warming path: stub=False, cache_warming=True, services=[]
                    when daemon just started and the first probe
                    cycle hasn't completed. FE renders "Probes
                    warming up..." instead of empty state.

  - Per-service projection: name + status (4-value enum incl.
    "unknown") + last_check_at + nullable latency_ms + details +
    nullable error. Matches the TS extension below.

# TS contract extension (additive, web/src/lib/api.ts)

  HeartbeatStatus: "healthy" | "degraded" | "unhealthy" | "unknown"
                   (was 3-value; "unknown" added for auth-missing /
                    probe-timeout / probe-loop-crash cases)
  HeartbeatService:
    last_check_at: string | null   (was non-nullable string)
    latency_ms:    number | null    (was non-nullable number)
    error:         string | null    (NEW; sanitized by probe layer)
  HeartbeatServicesResponse:
    cache_warming: boolean          (NEW; warming-branch flag)

Pattern matches KR-MCP-CONSUMPTION ST2 + KR-MCP-CLIENTS-FLIP
additive-FE extensions PM acked previously. Not breaking —
existing consumers see the new fields as optional / get the
new enum value gracefully.

# Doppler env-mapping doc update

kora_docs/15_status_and_roadmap/kora_runtime_doppler_env_mapping.md
extended with:

  - New "Phase 2 Feature 2 — Heartbeat probes" sub-section under
    kora-runtime-gateways. 8 env rows (7 required + 1 optional
    KORA_FLY_STAGING_APP_NAME). Each row carries mint-via link,
    required scope, and example shape. Notes that these are NOT
    deploy-blocking — the heartbeat panel degrades to "unknown"
    when env unset.
  - Two new entries in Optional / deploy-tunable env vars table:
    * KORA_HEARTBEAT_PROBE_INTERVAL_SEC (this bucket)
    * KORA_MCP_HEALTH_CHECK_INTERVAL_SEC (KR-MCP-CONSUMPTION ST2,
      previously undocumented)
  - Validation checklist extended with an opt-in section probing
    the heartbeat tokens. Marked clearly as non-blocking.

# §4 PM-open questions — applied verbatim

  - Q1 cadence default 5min, distinct task names (locked in ST1).
  - Q2 cache_warming flag: yes — exactly the spec wording.
  - Q3 Doppler doc update: yes — Phase 2 Feature 2 section + the
    validation checklist extension landed in this PR.

# Tests

14 tests in tests/kora_cli/test_web_server_heartbeat.py:

  - Endpoint returns 200 (unchanged)
  - Top-level shape with cache_warming=True (empty cache branch)
  - Top-level shape with cache_warming=False (populated)
  - All 5 expected services present when cache populated
  - Canonical order preserved (insertion order irrelevant)
  - Per-service required keys (8 incl. new error field)
  - status="unknown" passes valid-enum check
  - latency_ms null on unknown
  - error populated on unhealthy
  - error null on healthy
  - cache_warming branch: services=[] + stub=False
  - cache_warming=False with even one snapshot present
  - Error string passthrough — endpoint doesn't add token shapes
    (defense-in-depth Bearer / ghp_ scan over the serialized
    response)
  - Cron-regression sanity

14/14 pass + 242/242 cross-bucket regression
(test_listeners/ + test_heartbeat_probes/ + test_web_server_heartbeat.py
+ test_web_server_mcp_clients.py). Ruff clean.

Manual smoke: curl /api/heartbeat/services returns the warming-
branch shape when cache empty (no daemon running locally) — pinned
the shape + verified no token-value-shape strings serialize.

# After this PR

Feature 2 closes. Joshua needs to set the 5 (+1 optional) probe
tokens in Doppler's kora-runtime-gateways project per the
extended env-mapping doc; the next probe cycle (≤5min after
daemon start) flips each panel card from "unknown" to live state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rafe-walker rafe-walker merged commit 68f7ca6 into feature/phase2-upgrades May 22, 2026
@rafe-walker rafe-walker deleted the feat/kora-KR-FEAT-HEARTBEAT-ST2 branch May 22, 2026 16:27
rafe-walker added a commit that referenced this pull request May 22, 2026
…tigation (#125)

Two-part cleanup.

Part A (tsc drift): HeartbeatPanel.tsx + DashboardPage.tsx now handle all 4 fields KR-FEAT-HEARTBEAT ST2 (#118) added: unknown status arm (muted probe-pending pill + CircleDashed icon), nullable last_check_at (never checked — matching #117 convention), error field rendered as destructive <pre> in expanded view (plain text, no dangerouslySetInnerHTML), cache_warming Probes warming up… banner + dashboard headline-tone suppression. tsc -b clean.

Part B (mcp_clients tests): investigation only, STOPPED per spec rule. Root cause is NOT stale assertions — all 13 failures + 5 errors collapse on ModuleNotFoundError: No module named slowapi from the unconditional import chain web_server → listeners/__init__ → listeners/webhooks → slowapi. Verified by installing --extra web: 19/19 mcp_clients tests pass.

This is task NousResearch#265 (slowapi placement). Recommended 1-line fix: move slowapi to runtime deps. Will dispatch as standalone KR-SLOWAPI-DEP-FIX bucket.

3 files, +267/-14 panel + dashboard + 11 new source-pin tests. 260/260 admin-panel tests pass across 22 suites (with --extra web installed). tsc -b + vite build both clean.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant