Skip to content

fix(health): use runtime snapshot for channel summaries#713

Open
BingqingLyu wants to merge 5 commits intomainfrom
fork-pr-46527-fix-health-telegram-runtime
Open

fix(health): use runtime snapshot for channel summaries#713
BingqingLyu wants to merge 5 commits intomainfrom
fork-pr-46527-fix-health-telegram-runtime

Conversation

@BingqingLyu
Copy link
Copy Markdown
Owner

@BingqingLyu BingqingLyu commented Apr 27, 2026

Summary

  • Problem: openclaw health --json rebuilt channel summaries from config plus probe, but did not feed in the live gateway channel runtime snapshot.
  • Why it matters: Telegram could show running: false, lastStartAt: null, and tokenSource: "none" in health while channels status and live traffic showed the same account running normally.
  • What changed: thread the live runtime snapshot into health refreshes, build per-account health snapshots through the normal channel snapshot builder, invalidate cached health snapshots when the runtime state is newer than the cached summary, and guard the staleness check so channels that omit lastStartAt (WhatsApp, Zalo) do not cause perpetual cache invalidation.
  • What did NOT change (scope boundary): no channel runtime logic, probe behavior, or channels.status output path was changed.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

openclaw health --json now reflects live channel runtime fields when the gateway has them, instead of falling back to config/probe-only summaries for channels like Telegram.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS 26.3.1
  • Runtime/container: local gateway, npm global openclaw@2026.3.13 for before-state; patched checkout on this branch for after-state
  • Model/provider: not model-specific
  • Integration/channel (if any): Telegram
  • Relevant config (redacted): ~/.openclaw/openclaw.json5 with channels.telegram.botToken

Steps

  1. Configure Telegram and start the local gateway.
  2. Confirm Telegram traffic is working.
  3. Compare openclaw health --json with openclaw channels status --json.

Expected

  • health should agree with the live runtime state for fields like running, lastStartAt, mode, and tokenSource.

Actual

  • health reported Telegram as stopped / tokenSource: "none" while channels status reported the same account running from config.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Screenshot: before

Before repro: installed build health mismatch

Screenshot: after

After repro: patched branch health matches runtime

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: reproduced the mismatch on the installed 2026.3.13 build, traced the issue into getHealthSnapshot, confirmed the health cache could return a summary older than the live Telegram runtime, added regression tests for stale runtime-backed cache invalidation and for channels that omit lastStartAt, ran the targeted tests plus pnpm build, and captured a real before/after shell repro showing the mismatch on the installed build and agreement on this patched branch running from a proper git worktree.
  • Edge cases checked: health refresh still works without a runtime provider; runtime provider is cleared on gateway shutdown; cached health snapshots are bypassed when runtime running or lastStartAt is newer than the cache; channels that omit lastStartAt (WhatsApp, Zalo) no longer cause perpetual cache invalidation.
  • What you did not verify: I did not run a full multi-channel regression beyond Telegram for this cache invalidation path.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps:

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: revert commits 49415a3, 710e3f1, 8dff28d, and f30ce95
  • Files/config to restore: src/gateway/server-methods/health.ts, src/gateway/server/health-state.ts, src/gateway/server/health-state.test.ts, src/commands/health.ts
  • Known bad symptoms reviewers should watch for: health stops reflecting live channel runtime fields after channels connect, or gateway shutdown leaves a stale health runtime provider behind

Risks and Mitigations

  • Risk: health summaries now depend on the gateway runtime snapshot being available during refresh.
    • Mitigation: the new code keeps the old behavior when no runtime snapshot provider exists, and only bypasses cached health when runtime state is clearly newer than the cache.

AI-assisted: yes.

0xble added 5 commits March 14, 2026 21:27
Keep probe results in health summaries when plugin snapshot builders omit the probe field, add regression coverage for that path, and harden health-state test cleanup via afterEach.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants