Skip to content

fix(gateway): deliver targeted exec-event wakes when heartbeat.every is disabled#79869

Open
xinhuagu wants to merge 2 commits into
openclaw:mainfrom
xinhuagu:fix/62505-exec-event-zero-interval
Open

fix(gateway): deliver targeted exec-event wakes when heartbeat.every is disabled#79869
xinhuagu wants to merge 2 commits into
openclaw:mainfrom
xinhuagu:fix/62505-exec-event-zero-interval

Conversation

@xinhuagu

@xinhuagu xinhuagu commented May 9, 2026

Copy link
Copy Markdown
Contributor

Summary

  • allow targeted exec-event heartbeat wakes to run even when heartbeat.every resolves disabled (0m)
  • keep ordinary disabled heartbeats disabled by limiting the bypass to targeted exec-event wakes
  • add regression coverage for both scheduler dispatch and runHeartbeatOnce

Problem

Issue #62505 reports that coding-agent/background work can appear to never complete. One narrow cause is that background exec completion already enqueues an exec-event wake, but heartbeat dispatch drops it when heartbeat.every is disabled. That blocks the one-shot completion wake even though it is not a periodic heartbeat run.

Changes

  • in src/infra/heartbeat-runner.ts
    • allow runHeartbeatOnce(...) to proceed for exec-event wakes when the interval is otherwise disabled
    • allow targeted/session-scoped exec-event wakes through the runner even when no interval-backed agent state exists
  • add focused regression tests in:
    • src/infra/heartbeat-runner.scheduler.test.ts
    • src/infra/heartbeat-runner.returns-default-unset.test.ts

Testing

  • pnpm exec oxlint src/infra/heartbeat-runner.ts src/infra/heartbeat-runner.scheduler.test.ts src/infra/heartbeat-runner.returns-default-unset.test.ts
  • attempted focused Vitest, but the repo-local test harness is currently failing in this environment before these tests execute:
    • test/non-isolated-runner.ts throws Class extends value undefined is not a constructor or null

Closes #62505

@openclaw-barnacle openclaw-barnacle Bot added size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 9, 2026
@clawsweeper

clawsweeper Bot commented May 9, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 1, 2026, 1:13 AM ET / 05:13 UTC.

Summary
Review failed before ClawSweeper could summarize the requested change.

PR surface: Source +87, Tests +129. Total +216 across 3 files.

Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path.

Review metrics: none identified.

Merge readiness
Overall: 🌊 off-meta tidepool
Proof: 🌊 off-meta tidepool
Patch quality: 🌊 off-meta tidepool
Result: rating does not apply to this item.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

  • [P1] No close action taken because the review did not complete.

Maintainer options:

  1. Decide the mitigation before merge
    Retry the Codex review after fixing the execution failure.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • [P1] Review did not complete, so no work-lane recommendation was made.
Review details

Best possible solution:

Retry the Codex review after fixing the execution failure.

Do we have a high-confidence way to reproduce the issue?

Unclear. The review failed before ClawSweeper could establish a reproduction path.

Is this the best way to solve the issue?

Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction.

AGENTS.md: unclear because the file could not be read completely.

Codex review notes: model gpt-5.5, reasoning high; reviewed against c0195f7ed579.

Label changes

Label changes:

  • add rating: 🌊 off-meta tidepool: Overall readiness is 🌊 off-meta tidepool; proof is 🌊 off-meta tidepool and patch quality is 🌊 off-meta tidepool.
  • remove P1: Current review triage priority is none.
  • remove rating: 🧂 unranked krab: Current PR rating is rating: 🌊 off-meta tidepool, so this older rating label is no longer current.
  • remove merge-risk: 🚨 message-delivery: Current PR review selected no merge-risk labels.
  • remove status: 📣 needs proof: Current PR status no longer selects a status label.

Label justifications:

  • rating: 🌊 off-meta tidepool: Overall readiness is 🌊 off-meta tidepool; proof is 🌊 off-meta tidepool and patch quality is 🌊 off-meta tidepool.
Evidence reviewed

PR surface:

Source +87, Tests +129. Total +216 across 3 files.

View PR surface stats
Area Files Added Removed Net
Source 1 97 10 +87
Tests 2 129 0 +129
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 3 226 10 +216

What I checked:

  • failure reason: codex execution failed.
  • codex failure detail: Codex review failed for this PR with exit 1.
  • codex stdout: Per-item Codex failure; continuing with the rest of the shard.

Likely related people:

  • unknown: Codex failed before it could trace repository history. (role: review did not complete; confidence: low)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 078830765f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/infra/heartbeat-runner.ts Outdated
Comment on lines +2203 to +2206
return await runOnce({
cfg: state.cfg,
agentId: targetAgentId,
heartbeat: resolveRequestedHeartbeat(resolveHeartbeatConfig(state.cfg, targetAgentId)),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve exec-event cooldown when every is 0m

When a targeted exec-event wake arrives for an agent whose heartbeat.every is 0m, state.agents has no HeartbeatAgentState, so this new fallback calls runOnce directly and skips the evaluateWakeDeferral/recordRunBookkeeping path used by normal targeted exec wakes. That reopens the runaway case documented in heartbeat-cooldown.ts and the scheduler regression test: repeated background process exits for the same session can invoke a fresh heartbeat/model run every time instead of respecting the min-spacing/flood guard, specifically for users who disable interval heartbeats but still want exec-event delivery.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e7576ebf33

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +2243 to +2245
const deferral = evaluateWakeDeferral(fallbackAgent, now, reason, intent);
if (deferral.defer) {
return { status: "skipped", reason: deferral.reason };

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Requeue deferred zero-interval exec events

When heartbeat.every is 0m, this new fallback state is not in state.agents, so a second targeted exec-event that arrives within the 30s cooldown returns skipped/not-due here but is never retried: heartbeat-wake.ts:229-240 only requeues retryable busy reasons, and scheduleNext() still returns early when state.agents.size === 0. In the common case of two background execs finishing close together, the later system event stays queued indefinitely unless some unrelated wake happens after the cooldown.

Useful? React with 👍 / 👎.

anagnorisis2peripeteia added a commit to anagnorisis2peripeteia/openclaw that referenced this pull request May 19, 2026
The cron `wake` MCP tool currently forwards only `{mode, text}` to the
gateway. Every wake then enqueues a system event with no sessionKey /
agentId, so the cron service falls back to the heartbeat / main
default. Wakes scheduled from a non-main session (Telegram thread,
Discord channel, multi-agent setup) silently route to the wrong
conversation lane — and on CLI runtimes the woken session burns
tokens generating output that no caller can route back.

Origin capture (closes the upstream half of openclaw#46886 and openclaw#64556):

  - `src/agents/tools/cron-tool.ts` — the `wake` case now resolves
    `opts.agentSessionKey` through `resolveInternalSessionKey` and
    `resolveSessionAgentId`, matching the existing `add` action at
    L569-583. Explicit `sessionKey` / `agentId` params on the tool
    call take precedence over the inferred values so cross-session
    wakes remain expressible.
  - `src/gateway/protocol/schema/agent.ts` — `WakeParamsSchema` now
    declares optional `sessionKey` and `agentId` (NonEmptyString)
    so the gateway-level validator types them explicitly. The
    schema's `additionalProperties: true` continues to accept
    forward-compat metadata unchanged.
  - `src/gateway/server-methods/cron.ts` — the wake handler reads
    both fields, trims them, and forwards to `context.cron.wake`.
  - `src/cron/service.ts` + `src/cron/service/ops.ts` +
    `src/cron/service/timer.ts` — `wake()` accepts optional
    `sessionKey`/`agentId` and threads them into
    `enqueueSystemEvent` and `requestHeartbeatNow`. Missing /
    whitespace-only fields fall through to the dep's default so
    pre-existing call sites with no origin keep behaving the same
    way (backwards compatible).

Tests:

  - `src/cron/service/wake-origin.test.ts` (new, 6 tests) —
    direct seams on `wake()`: origin forwarded on `mode: "now"`,
    queued on `mode: "next-heartbeat"`, default fallback when
    origin omitted, whitespace-only origin treated as omitted,
    empty text still rejected.
  - `src/gateway/protocol/index.test.ts` — extends
    `validateWakeParams` coverage with the new optional fields
    accepted + empty-string rejected.

Out of scope (deliberate split):

  - Channel/thread/topic capture on the job's `delivery` block —
    follow-up PR once this contract lands. The minimum-viable fix
    here is session routing, which unblocks the CLI runtime's
    `--resume <session>` path and the embedded session-resolution
    path without a schema rewrite of the delivery contract.
  - The 5 stalled wake-related PRs (openclaw#70268, openclaw#57199, openclaw#82767,
    openclaw#79869, openclaw#63096) each fix downstream specifics. This PR fixes
    the upstream origin-capture they all silently assume.
anagnorisis2peripeteia added a commit to anagnorisis2peripeteia/openclaw that referenced this pull request May 19, 2026
The cron `wake` MCP tool currently forwards only `{mode, text}` to the
gateway. Every wake then enqueues a system event with no sessionKey /
agentId, so the cron service falls back to the heartbeat / main
default. Wakes scheduled from a non-main session (Telegram thread,
Discord channel, multi-agent setup) silently route to the wrong
conversation lane — and on CLI runtimes the woken session burns
tokens generating output that no caller can route back.

Origin capture (closes the upstream half of openclaw#46886 and openclaw#64556):

  - `src/agents/tools/cron-tool.ts` — the `wake` case now resolves
    `opts.agentSessionKey` through `resolveInternalSessionKey` and
    `resolveSessionAgentId`, matching the existing `add` action at
    L569-583. Explicit `sessionKey` / `agentId` params on the tool
    call take precedence over the inferred values so cross-session
    wakes remain expressible.
  - `src/gateway/protocol/schema/agent.ts` — `WakeParamsSchema` now
    declares optional `sessionKey` and `agentId` (NonEmptyString)
    so the gateway-level validator types them explicitly. The
    schema's `additionalProperties: true` continues to accept
    forward-compat metadata unchanged.
  - `src/gateway/server-methods/cron.ts` — the wake handler reads
    both fields, trims them, and forwards to `context.cron.wake`.
  - `src/cron/service.ts` + `src/cron/service/ops.ts` +
    `src/cron/service/timer.ts` — `wake()` accepts optional
    `sessionKey`/`agentId` and threads them into
    `enqueueSystemEvent` and `requestHeartbeatNow`. Missing /
    whitespace-only fields fall through to the dep's default so
    pre-existing call sites with no origin keep behaving the same
    way (backwards compatible).

Tests:

  - `src/cron/service/wake-origin.test.ts` (new, 6 tests) —
    direct seams on `wake()`: origin forwarded on `mode: "now"`,
    queued on `mode: "next-heartbeat"`, default fallback when
    origin omitted, whitespace-only origin treated as omitted,
    empty text still rejected.
  - `src/gateway/protocol/index.test.ts` — extends
    `validateWakeParams` coverage with the new optional fields
    accepted + empty-string rejected.

Out of scope (deliberate split):

  - Channel/thread/topic capture on the job's `delivery` block —
    follow-up PR once this contract lands. The minimum-viable fix
    here is session routing, which unblocks the CLI runtime's
    `--resume <session>` path and the embedded session-resolution
    path without a schema rewrite of the delivery contract.
  - The 5 stalled wake-related PRs (openclaw#70268, openclaw#57199, openclaw#82767,
    openclaw#79869, openclaw#63096) each fix downstream specifics. This PR fixes
    the upstream origin-capture they all silently assume.
anagnorisis2peripeteia added a commit to anagnorisis2peripeteia/openclaw that referenced this pull request May 19, 2026
The cron `wake` MCP tool currently forwards only `{mode, text}` to the
gateway. Every wake then enqueues a system event with no sessionKey /
agentId, so the cron service falls back to the heartbeat / main
default. Wakes scheduled from a non-main session (Telegram thread,
Discord channel, multi-agent setup) silently route to the wrong
conversation lane — and on CLI runtimes the woken session burns
tokens generating output that no caller can route back.

Origin capture (closes the upstream half of openclaw#46886 and openclaw#64556):

  - `src/agents/tools/cron-tool.ts` — the `wake` case now resolves
    `opts.agentSessionKey` through `resolveInternalSessionKey` and
    `resolveSessionAgentId`, matching the existing `add` action at
    L569-583. Explicit `sessionKey` / `agentId` params on the tool
    call take precedence over the inferred values so cross-session
    wakes remain expressible.
  - `src/gateway/protocol/schema/agent.ts` — `WakeParamsSchema` now
    declares optional `sessionKey` and `agentId` (NonEmptyString)
    so the gateway-level validator types them explicitly. The
    schema's `additionalProperties: true` continues to accept
    forward-compat metadata unchanged.
  - `src/gateway/server-methods/cron.ts` — the wake handler reads
    both fields, trims them, and forwards to `context.cron.wake`.
  - `src/cron/service.ts` + `src/cron/service/ops.ts` +
    `src/cron/service/timer.ts` — `wake()` accepts optional
    `sessionKey`/`agentId` and threads them into
    `enqueueSystemEvent` and `requestHeartbeatNow`. Missing /
    whitespace-only fields fall through to the dep's default so
    pre-existing call sites with no origin keep behaving the same
    way (backwards compatible).

Tests:

  - `src/cron/service/wake-origin.test.ts` (new, 6 tests) —
    direct seams on `wake()`: origin forwarded on `mode: "now"`,
    queued on `mode: "next-heartbeat"`, default fallback when
    origin omitted, whitespace-only origin treated as omitted,
    empty text still rejected.
  - `src/gateway/protocol/index.test.ts` — extends
    `validateWakeParams` coverage with the new optional fields
    accepted + empty-string rejected.

Out of scope (deliberate split):

  - Channel/thread/topic capture on the job's `delivery` block —
    follow-up PR once this contract lands. The minimum-viable fix
    here is session routing, which unblocks the CLI runtime's
    `--resume <session>` path and the embedded session-resolution
    path without a schema rewrite of the delivery contract.
  - The 5 stalled wake-related PRs (openclaw#70268, openclaw#57199, openclaw#82767,
    openclaw#79869, openclaw#63096) each fix downstream specifics. This PR fixes
    the upstream origin-capture they all silently assume.
@openclaw-barnacle

Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added the stale Marked as stale due to inactivity label May 29, 2026
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. labels May 29, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the stale Marked as stale due to inactivity label Jun 1, 2026
@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. P1 High-priority user-facing bug, regression, or broken workflow. rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. size: M triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Coding Agent never completes anything (worked in 2026.4.2 and earlier)

1 participant