Skip to content

fix(cron): warn on main heartbeat handoff ghost runs#72677

Open
liaoandi wants to merge 2 commits into
openclaw:mainfrom
liaoandi:fix/63106-cron-next-heartbeat-handoff-warning
Open

fix(cron): warn on main heartbeat handoff ghost runs#72677
liaoandi wants to merge 2 commits into
openclaw:mainfrom
liaoandi:fix/63106-cron-next-heartbeat-handoff-warning

Conversation

@liaoandi

@liaoandi liaoandi commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

Fixes #63106.

Summary

  • Add a scoped warning for fast successful main-session systemEvent cron jobs using wakeMode="next-heartbeat".
  • Persist the possible-main-next-heartbeat-ghost-run annotation in cron run logs.
  • Carry and render cron run-log warnings in the Control UI run history.
  • Add cron.ghostRunWarningThresholdMs config with schema/help metadata and public docs.
  • Keep CronRunLogEntry.warnings source-compatible in generated Swift by defaulting the optional initializer parameter to nil.
  • Keep release-note context in the PR body only; this branch does not edit CHANGELOG.md.

Real behavior proof

Behavior or issue addressed: Verified that a fast successful main-session systemEvent cron handoff using wakeMode: "next-heartbeat" writes the possible-main-next-heartbeat-ghost-run warning into the cron run log, that Control UI run history can render that warning from CronRunLogEntry.warnings, and that existing Swift call sites can construct CronRunLogEntry without passing the new optional warnings label.

Real environment tested: Local macOS checkout on current PR head 5238f67159, rebased on current upstream/main 4a206db106, using the real buildGatewayCronService, real cron run-log reader, Control UI cron run-history renderer, protocol generators, and Swift compiler typecheck. The cron proof used an isolated temporary OPENCLAW_HOME and did not touch the real user cron store.

Exact steps or command run after this patch: Rebased onto current upstream/main, regenerated gateway protocol artifacts, ran focused cron/gateway and Control UI tests, checked formatting/whitespace, verified generated protocol output stability, and typechecked a Swift compatibility snippet that constructs CronRunLogEntry without warnings.

Evidence after fix: Current-head focused test output:

$ node scripts/test-projects.mjs src/gateway/server-cron.test.ts src/cron/cron-protocol-conformance.test.ts ui/src/ui/views/cron.test.ts --reporter verbose

 RUN  v4.1.7 /Users/antonio/projects/openclaw_worktrees/pr72677_clean

 Test Files  3 passed (3)
      Tests  42 passed (42)
   Start at  18:42:27
   Duration  16.44s
$ ./node_modules/.bin/oxfmt --check scripts/protocol-gen-swift.ts src/gateway/server-cron.ts src/gateway/server-cron.test.ts src/cron/run-log.ts src/cron/cron-protocol-conformance.test.ts ui/src/ui/views/cron.ts ui/src/ui/views/cron.test.ts
Checking formatting...

All matched files use the correct format.
Finished in 250ms on 7 files using 12 threads.

Current-head isolated run-log proof shape, redacted from the real gateway cron service/run-log path:

{
  "home": "/tmp/openclaw-ghost-proof.redacted",
  "jobId": "redacted-job-id",
  "logPath": "/tmp/openclaw-ghost-proof.redacted/runs/redacted-job-id.jsonl",
  "latest": {
    "action": "finished",
    "status": "ok",
    "summary": "proof handoff",
    "runId": "cron:redacted-job-id:redacted-start-ms",
    "durationMs": 7,
    "deliveryStatus": "not-requested",
    "warnings": [
      "possible-main-next-heartbeat-ghost-run"
    ]
  }
}

Swift compatibility proof:

$ swiftc -typecheck \
  apps/shared/OpenClawKit/Sources/OpenClawProtocol/AnyCodable.swift \
  apps/shared/OpenClawKit/Sources/OpenClawProtocol/GatewayModels.swift \
  /private/tmp/openclaw-pr72677-swift-compat.swift
(no output)

The compatibility snippet constructs CronRunLogEntry(...) without a warnings: argument, matching existing Swift call sites.

Protocol generation stability:

$ node --import tsx scripts/protocol-gen.ts
wrote /Users/antonio/projects/openclaw_worktrees/pr72677_clean/dist/protocol.schema.json

$ node --import tsx scripts/protocol-gen-swift.ts
wrote /Users/antonio/projects/openclaw_worktrees/pr72677_clean/apps/shared/OpenClawKit/Sources/OpenClawProtocol/GatewayModels.swift

$ git diff --exit-code -- dist/protocol.schema.json apps/shared/OpenClawKit/Sources/OpenClawProtocol/GatewayModels.swift
(no output)

Format and whitespace checks:

$ git diff --check upstream/main...HEAD
(no output)

Observed result after fix: The forced main-session next-heartbeat handoff completed successfully, the persisted run-log entry contains warnings: ["possible-main-next-heartbeat-ghost-run"], Control UI run history renders that warning as a warning chip, regenerated Swift now emits warnings: [String]? = nil, and an old-style Swift initializer call without warnings: typechecks.

What was not tested: I did not dispatch a real scheduled agent turn from the user's production cron store; the warning proof used an isolated temporary OpenClaw home with the real gateway cron service and run-log modules so it would not affect real scheduled jobs. Full swift test --package-path apps/shared/OpenClawKit could not run on this machine because the active developer directory is Command Line Tools only and SwiftPM rejects the package platform declarations .iOS(.v18) / .macOS(.v15) before compiling sources; the targeted Swift compiler proof above covers the specific initializer compatibility blocker.

Tests

  • node scripts/test-projects.mjs src/gateway/server-cron.test.ts src/cron/cron-protocol-conformance.test.ts ui/src/ui/views/cron.test.ts --reporter verbose
  • ./node_modules/.bin/oxfmt --check scripts/protocol-gen-swift.ts src/gateway/server-cron.ts src/gateway/server-cron.test.ts src/cron/run-log.ts src/cron/cron-protocol-conformance.test.ts ui/src/ui/views/cron.ts ui/src/ui/views/cron.test.ts
  • git diff --check upstream/main...HEAD
  • node --import tsx scripts/protocol-gen.ts
  • node --import tsx scripts/protocol-gen-swift.ts
  • git diff --exit-code -- dist/protocol.schema.json apps/shared/OpenClawKit/Sources/OpenClawProtocol/GatewayModels.swift
  • swiftc -typecheck apps/shared/OpenClawKit/Sources/OpenClawProtocol/AnyCodable.swift apps/shared/OpenClawKit/Sources/OpenClawProtocol/GatewayModels.swift /private/tmp/openclaw-pr72677-swift-compat.swift

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime size: M labels Apr 27, 2026
@greptile-apps

greptile-apps Bot commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a scoped ghost-run warning for fast ok main-session systemEvent cron jobs running with wakeMode="next-heartbeat", where a quick finish typically indicates the cron scheduler handed work to the next heartbeat rather than confirming actual agent processing. The change is consistent across all relevant layers: type definitions, Zod schema, generated config schema, help/label metadata, run-log parsing, and the gateway cron service itself.

Confidence Score: 5/5

Safe to merge — additive warning path with no changes to existing control flow.

All changes are additive: new config field with a safe default (50ms), new warning emission that does not alter the run outcome, and proper de-serialization of the new warnings field in run-log parsing. Scoping of ghostRunWarning within the evt.action === "finished" block is correct and consistent with the appendCronRunLog call that references it. Edge cases (threshold = 0 to disable, undefined durationMs) are explicitly handled. Tests cover both the positive and negative trigger paths.

No files require special attention.

Reviews (1): Last reviewed commit: "fix(cron): warn on main heartbeat handof..." | Re-trigger Greptile

@clawsweeper

clawsweeper Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 9, 2026, 6:09 AM ET / 10:09 UTC.

Summary
The PR adds default-on cron ghost-run warnings for fast main-session next-heartbeat systemEvent handoffs, carries warning annotations through run history/protocol/Swift/UI/docs, and refactors two Matrix event listeners.

PR surface: Source +91, Tests +157, Docs +12, Other +4. Total +264 across 20 files.

Reproducibility: yes. for source-level reproduction: current main's main-session next-heartbeat cron path enqueues the system event, requests a heartbeat, and returns ok without waiting for confirmed agent processing. I did not run a live current-main cron job, so this is source-reproducible rather than directly reproduced.

Review metrics: 2 noteworthy metrics.

  • Config/default surfaces: 1 added. The new cron.ghostRunWarningThresholdMs default can change the warnings existing operators see after upgrade.
  • Run-history protocol fields: 1 optional field added. CronRunLogEntry.warnings is additive, but clients that inspect exact run-history payloads need compatibility awareness.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🦐 gold shrimp
Result: blocked until stronger real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Refresh real behavior proof against current head 3928f865da481d94edddb47dbaad929491d7b374, redacting private data.
  • Replace the new JSONL run-log wording with SQLite-backed run-history guidance.
  • Get maintainer acceptance for the default-on warning/protocol compatibility impact.

Proof guidance:

  • [P1] Needs stronger real behavior proof before merge: The PR body includes terminal and redacted run-log proof for older head 5238f67159, but current head is 3928f865da481d94edddb47dbaad929491d7b374 and includes an additional Matrix listener refactor. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

  • [P1] cron.ghostRunWarningThresholdMs is a new default-on config/default surface, so existing main-session next-heartbeat jobs can begin producing warning annotations and Control UI chips after upgrade.
  • [P1] This is a diagnostic mitigation for cron: ghost runs recorded as ok when gateway is down (durationMs < 50ms) #63106, not proof that the queued system event was processed by an agent turn.
  • [P1] The PR body proof was collected on an older head and does not cover the current Matrix listener refactor commit.

Maintainer options:

  1. Accept the default-on diagnostic
    Maintainers can intentionally accept that upgrades may show new warning annotations/chips for existing fast main-session handoffs after the docs and proof gaps are fixed.
  2. Make the warning opt-in
    The branch could preserve existing run-history output by default and require users to set cron.ghostRunWarningThresholdMs before emitting the new warning.
  3. Pause for a causal fix
    If maintainers want cron to prove or await agent processing instead of flagging suspected handoffs, this PR should pause in favor of a narrower runtime fix for that behavior.

Next step before merge

  • [P1] The docs repair is mechanical, but stale current-head proof and default-on compatibility acceptance require contributor and maintainer action before automation should proceed.

Security
Cleared: The diff does not add dependencies, workflows, secrets handling, download/execute paths, or broader permissions; the new cron warning log avoids payload text.

Review findings

  • [P2] Point ghost-run docs at SQLite run history — docs/cli/cron.md:237
Review details

Best possible solution:

Land only after the docs point to SQLite-backed run history, the contributor refreshes current-head proof, and maintainers explicitly accept the default-on diagnostic/protocol addition.

Do we have a high-confidence way to reproduce the issue?

Yes for source-level reproduction: current main's main-session next-heartbeat cron path enqueues the system event, requests a heartbeat, and returns ok without waiting for confirmed agent processing. I did not run a live current-main cron job, so this is source-reproducible rather than directly reproduced.

Is this the best way to solve the issue?

Unclear as a final fix: the gateway/run-log warning is an acceptable diagnostic layer, but it does not prove agent processing and still needs docs repair, current-head proof, and maintainer acceptance of the default-on compatibility impact.

Full review comments:

  • [P2] Point ghost-run docs at SQLite run history — docs/cli/cron.md:237
    This new bullet tells operators to inspect the job's run-log JSONL, but current cron run history is stored and read from SQLite, with legacy runs/*.jsonl only imported and renamed. Users investigating the new warning would look for a stale/nonexistent active log; keep the guidance to openclaw cron runs --id <job-id> or otherwise describe the SQLite-backed run history.
    Confidence: 0.93

Overall correctness: patch is incorrect
Overall confidence: 0.86

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 5e1fbca3cbc6.

Label changes

Label changes:

  • add rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🦐 gold shrimp.
  • add status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: The PR body includes terminal and redacted run-log proof for older head 5238f67159, but current head is 3928f865da481d94edddb47dbaad929491d7b374 and includes an additional Matrix listener refactor. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
  • remove rating: 🦐 gold shrimp: Current PR rating is rating: 🦪 silver shellfish, so this older rating label is no longer current.
  • remove status: ⏳ waiting on author: Current PR status label is status: 📣 needs proof.

Label justifications:

  • P2: This is a normal-priority cron diagnostic bugfix with limited blast radius and one concrete docs blocker before merge.
  • merge-risk: 🚨 compatibility: The PR adds a default-on config behavior and an optional run-history protocol field that can change upgraded users' visible cron history output.
  • rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🦐 gold shrimp.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: The PR body includes terminal and redacted run-log proof for older head 5238f67159, but current head is 3928f865da481d94edddb47dbaad929491d7b374 and includes an additional Matrix listener refactor. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
Evidence reviewed

PR surface:

Source +91, Tests +157, Docs +12, Other +4. Total +264 across 20 files.

View PR surface stats
Area Files Added Removed Net
Source 13 97 6 +91
Tests 3 157 0 +157
Docs 2 12 0 +12
Config 0 0 0 0
Generated 0 0 0 0
Other 2 5 1 +4
Total 20 271 7 +264

What I checked:

  • PR head and proof baseline: Live PR metadata shows the current head is 3928f865da481d94edddb47dbaad929491d7b374, while the PR body's real-environment proof says it was run on head 5238f67159, so the proof is not current-head proof after the latest force-push. (3928f865da48)
  • Docs defect in PR diff: The new ghost-run docs tell users to check openclaw cron runs --id <job-id> or the job's run-log JSONL for warnings, but the active cron run-history path is SQLite-backed. Public docs: docs/cli/cron.md. (docs/cli/cron.md:237, 3928f865da48)
  • Current run-log storage path: Current main appends cron run-log entries through the shared state database write path and reads pages from SQLite rows, not active per-job JSONL files. (src/cron/run-log.ts:135, 5e1fbca3cbc6)
  • Current cron docs storage contract: Current docs state cron jobs, pending runtime state, and run history live in the shared SQLite state database, with legacy runs/*.jsonl imported once and renamed. Public docs: docs/cli/cron.md. (docs/cli/cron.md:146, 5e1fbca3cbc6)
  • Source-reproducible cron behavior: Current main's main-session next-heartbeat path enqueues the system event, requests a heartbeat, and returns ok without awaiting confirmed agent processing, matching the diagnostic problem this PR is trying to surface. (src/cron/service/timer.ts:1538, 5e1fbca3cbc6)
  • Compatibility-sensitive config surface: Repository policy treats config/default additions as compatibility-sensitive; this PR adds a new default-on cron.ghostRunWarningThresholdMs config surface and a new run-history protocol field. (AGENTS.md:27, 5e1fbca3cbc6)

Likely related people:

  • steipete: Recent history for src/cron/run-log.ts includes cron run-log SQLite clarification and docs work, which is central to the stale JSONL docs mismatch. (role: recent cron run-log and docs contributor; confidence: high; commits: 875c9fd96dd8, 63de51ab9632, 2bd07eead7e7; files: src/cron/run-log.ts, docs/cli/cron.md)
  • shakkernerd: Recent history for src/cron/service/timer.ts shows multiple cron execution/cancellation fixes near the main-session execution path involved in the ghost-run behavior. (role: recent cron timer contributor; confidence: medium; commits: 9082233a433c, 24196e05f59b, 372f85d368d8; files: src/cron/service/timer.ts)
  • mbelinky: Recent history touches both src/gateway/server-cron.ts and ui/src/ui/views/cron.ts for cron command job support, overlapping the gateway/UI surfaces changed here. (role: adjacent gateway and Control UI cron contributor; confidence: medium; commits: b8adc11977ab; files: src/gateway/server-cron.ts, ui/src/ui/views/cron.ts)
  • steipete: Recent Matrix history includes async monitor callback lint work, which is the closest current-main ownership signal for the listener refactor added in the latest PR commit. (role: recent Matrix async listener contributor; confidence: medium; commits: b2a1c5caa854, 2df95c0b10fd; files: extensions/matrix/src/matrix/monitor/events.ts, extensions/matrix/src/matrix/monitor/auto-join.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@openclaw-barnacle openclaw-barnacle Bot added scripts Repository scripts docker Docker and sandbox tooling channel: line Channel integration: line app: web-ui App: web-ui channel: discord Channel integration: discord and removed scripts Repository scripts docker Docker and sandbox tooling app: web-ui App: web-ui labels Apr 29, 2026
@openclaw-barnacle openclaw-barnacle Bot added extensions: memory-core Extension: memory-core commands Command implementations agents Agent runtime and tooling size: XL triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. proof: supplied External PR includes structured after-fix real behavior proof. app: web-ui App: web-ui and removed channel: discord Channel integration: discord channel: line Channel integration: line size: M triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. extensions: memory-core Extension: memory-core commands Command implementations agents Agent runtime and tooling labels May 8, 2026
@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. P2 Normal backlog priority with limited blast radius. and removed rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. labels May 21, 2026
@clawsweeper

clawsweeper Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 21, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 21, 2026
@openclaw-barnacle openclaw-barnacle Bot added channel: nostr Channel integration: nostr extensions: diagnostics-otel Extension: diagnostics-otel docker Docker and sandbox tooling extensions: acpx extensions: codex extensions: diffs and removed size: L labels May 24, 2026
@liaoandi

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@liaoandi

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

Current head 55b75e7 carries CronRunLogEntry.warnings through the Control UI run-history type/renderer/test, removes the release-owned CHANGELOG.md entry, and updates the PR body Real behavior proof with focused UI + cron/gateway test output.

@clawsweeper

clawsweeper Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@liaoandi

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

Current head acd4a5f preserves Swift source compatibility for CronRunLogEntry.warnings by defaulting the generated optional initializer parameter to nil, regenerates GatewayModels.swift, updates Real behavior proof with focused cron/UI tests, protocol generation stability, and a Swift typecheck proving old call sites compile without warnings:.

@clawsweeper

clawsweeper Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

app: web-ui App: web-ui channel: matrix Channel integration: matrix docs Improvements or additions to documentation gateway Gateway runtime merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. P2 Normal backlog priority with limited blast radius. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. scripts Repository scripts size: M status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cron: ghost runs recorded as ok when gateway is down (durationMs < 50ms)

1 participant