fix(heartbeat): remap cron-run session keys for exec/ACP/CLI events (refs #52305)#50818
fix(heartbeat): remap cron-run session keys for exec/ACP/CLI events (refs #52305)#50818Kaspre wants to merge 6 commits into
Conversation
Greptile SummaryThis PR fixes a long-standing async context routing bug by propagating Key changes:
Confidence Score: 5/5
Reviews (2): Last reviewed commit: "fix: anchor exec completion regex to sta..." | Re-trigger Greptile |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c21582983d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Thanks for the review feedback. Pushed a commit addressing both Greptile suggestions:
Regarding @chatgpt-codex-connector's concern about |
|
To use Codex here, create an environment for this repo. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ca9df70e07
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Good catch on the stale
No config mutation — the original heartbeat object is untouched. The flag only affects downstream resolution for that call. |
|
Heads up — In practice this injects the heartbeat prompt into an in-flight cron session, confusing the model mid-task. We've hit this multiple times — an isolated overnight-thinking cron job running sequential The fix would be a one-line guard in export function scopedHeartbeatWakeOptions<T extends object>(
sessionKey: string,
wakeOptions: T,
): T | (T & { sessionKey: string }) {
if (!parseAgentSessionKey(sessionKey)) return wakeOptions;
if (isCronSessionKey(sessionKey)) return wakeOptions;
return { ...wakeOptions, sessionKey };
}
|
f3763a1 to
a3d2cfc
Compare
|
Good catch @briannewman — applied in a3d2cfc. Added the |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a3d2cfce0d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 869950add9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2ac7fcd7cb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
2ac7fcd to
347ca05
Compare
|
Squashed the incremental review-driven commits into a single clean commit (347ca05) for easier review. Here's what's in it: 1. sessionKey propagation — the original fix. 2. Exec completion detection — 3. Target routing for exec wakes — when 4. Cron session isolation (from @briannewman's feedback) — 5. Test coverage — new tests for Known limitation: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 347ca05493
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
fb22870 to
1dfa292
Compare
|
Branch rebased onto latest main, resolving merge conflicts and stale CI. Re: @greptile-apps suggestion to collapse the Requesting re-review. cc @greptile-apps @chatgpt-codex-connector |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1dfa292575
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Fixed the P1 issue raised by @chatgpt-codex-connector: Unscoped cron wakes — Cron session keys now resolve to the originating agent's main session key (e.g., This is consistent with how Test updated to verify the new scoping behavior. Commit: 3d854506d3. |
6a4bca4 to
d5af570
Compare
…xt loss Exec completion events enqueued under cron or channel-specific session keys were invisible to the heartbeat runner, which always checked the agent's main session key. This caused exec results to silently vanish instead of being relayed back to the originating channel. 1. Propagate sessionKey through scopedHeartbeatWakeOptions in bash exec, ACP spawn, gateway/node exec, and CLI watchdog paths 2. Add resolveEventSessionKey to remap cron session keys to the originating agent's main session key for event enqueueing Scope was previously broader. Dropped per maintainer review feedback on openclaw#50818 (issuecomment-4322767117): - The forceLastTargetWhenNone override for resolveHeartbeatDeliveryTarget, which routed exec completions to the session's last target even when heartbeat.target was explicitly "none". Current main and its regression tests intentionally keep target="none" internal-only, so this routing change is left for a separate focused PR if it is ever revisited. - The openclaw#39978 hook-cancel handling has already shipped via upstream PR openclaw#70258, so it is no longer part of this PR's scope. The PR now closes openclaw#52305 only. Closes openclaw#52305. Related: openclaw#18237. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codex review on PR openclaw#50818 [P3] flagged that this user-facing heartbeat/ async-exec routing fix was missing the required Unreleased/Fixes entry. Add a single-line entry describing the cron-run session-key remap at the four enqueue sites (bash exec, ACP, gateway node-event, CLI watchdog) plus the global-scope branch added in the follow-up commits. Refs openclaw#50818. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d5af570 to
239be03
Compare
239be03 to
6481b9b
Compare
…xt loss Exec completion events enqueued under cron or channel-specific session keys were invisible to the heartbeat runner, which always checked the agent's main session key. This caused exec results to silently vanish instead of being relayed back to the originating channel. 1. Propagate sessionKey through scopedHeartbeatWakeOptions in bash exec, ACP spawn, gateway/node exec, and CLI watchdog paths 2. Add resolveEventSessionKey to remap cron session keys to the originating agent's main session key for event enqueueing Scope was previously broader. Dropped per maintainer review feedback on openclaw#50818 (issuecomment-4322767117): - The forceLastTargetWhenNone override for resolveHeartbeatDeliveryTarget, which routed exec completions to the session's last target even when heartbeat.target was explicitly "none". Current main and its regression tests intentionally keep target="none" internal-only, so this routing change is left for a separate focused PR if it is ever revisited. - The openclaw#39978 hook-cancel handling has already shipped via upstream PR openclaw#70258, so it is no longer part of this PR's scope. The PR now closes openclaw#52305 only. Closes openclaw#52305. Related: openclaw#18237. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codex review on PR openclaw#50818 [P3] flagged that this user-facing heartbeat/ async-exec routing fix was missing the required Unreleased/Fixes entry. Add a single-line entry describing the cron-run session-key remap at the four enqueue sites (bash exec, ACP, gateway node-event, CLI watchdog) plus the global-scope branch added in the follow-up commits. Refs openclaw#50818. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Rebase onto upstream/main (2026-05-09) Resolved conflict in |
…xt loss Exec completion events enqueued under cron or channel-specific session keys were invisible to the heartbeat runner, which always checked the agent's main session key. This caused exec results to silently vanish instead of being relayed back to the originating channel. 1. Propagate sessionKey through scopedHeartbeatWakeOptions in bash exec, ACP spawn, gateway/node exec, and CLI watchdog paths 2. Add resolveEventSessionKey to remap cron session keys to the originating agent's main session key for event enqueueing Scope was previously broader. Dropped per maintainer review feedback on openclaw#50818 (issuecomment-4322767117): - The forceLastTargetWhenNone override for resolveHeartbeatDeliveryTarget, which routed exec completions to the session's last target even when heartbeat.target was explicitly "none". Current main and its regression tests intentionally keep target="none" internal-only, so this routing change is left for a separate focused PR if it is ever revisited. - The openclaw#39978 hook-cancel handling has already shipped via upstream PR openclaw#70258, so it is no longer part of this PR's scope. The PR now closes openclaw#52305 only. Closes openclaw#52305. Related: openclaw#18237. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… config-boundary guard
Two related changes, both surfaced by upstream-main checks on the rebased PR:
1. Architecture guard: replace ambient loadConfig() calls in agent-runtime
code with a passed cfg/mainKey threaded through the same layer the rest
of the agent's session config already lives at. ExecToolDefaults grows
`mainKey` + `sessionScope`; runExecProcess snapshots them onto the
ProcessSession; ACP startAcpSpawnParentStreamRelay accepts them as
params; cli-runner watchdog reads from params.config; createOpenClawCodingTools
threads them from options.config so normal agent tool runs are covered;
bash-command / commands-diagnostics / commands-export-trajectory pass
them through from cfg. The four sites flagged by
scripts/check-deprecated-internal-config-api.mjs are gone.
2. Global-scope correctness: when `cfg.session.scope === "global"`,
resolveHeartbeatSession drains the literal "global" queue, not
agent-main. resolveEventSessionKey and scopedHeartbeatWakeOptions both
take an optional `scope` parameter and route accordingly:
- resolveEventSessionKey returns "global" for cron-run keys under
global scope (instead of agent:<id>:main, which would strand events).
- scopedHeartbeatWakeOptions drops the targeted sessionKey but
preserves `agentId: parsed.agentId` so multi-agent global-scope
setups still wake the originating agent's heartbeat.
Tests in routing/session-key.test.ts cover the new global-scope branches
and the per-sender pass-through. Existing fixture agent IDs in this file
were genericized at the same time (einstein → ops, captain → primary).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex review on PR openclaw#50818 [P3] flagged that this user-facing heartbeat/ async-exec routing fix was missing the required Unreleased/Fixes entry. Add a single-line entry describing the cron-run session-key remap at the four enqueue sites (bash exec, ACP, gateway node-event, CLI watchdog) plus the global-scope branch added in the follow-up commits. Refs openclaw#50818. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6481b9b to
c766369
Compare
Adds an optional sessionKey to the WakeParamsSchema and threads it through the gateway wake handler, CronService.wake(), and the underlying timer.wake() ops so callers can target a specific session for async-task completion relays instead of always hitting the agent's main session. Also adds --session-key to `openclaw system event`. The schema rejects empty/non-string sessionKey at the gateway boundary; mismatched session keys (a key that does not belong to the resolving agent) fall back to the agent's main session inside resolveCronSessionKey, which is the existing safety path. Refs openclaw#52305 (companion to PR openclaw#50818, which closes the related cron-run remap slice at internal enqueue sites). Doesn't depend on openclaw#50818. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Updated this PR for current Changes since the earlier revision:
Validation:
Note: standalone |
|
Pushed follow-up commit Local verification:
Notes:
|
|
Superseded by #80214. |
Summary
Refs #52305 (cron-run remap slice). Related: #18237, #71213.
Async exec completion events enqueued from cron-run sessions silently vanish instead of being relayed to the originating channel. The root cause is a session key mismatch: events are enqueued under
agent:X:cron:Y:run:Zkeys, butheartbeat-runner.tspeeks the system-event queue under the agent-main key — so the events are never seen.Changes
Cron→agent-main remapping at enqueue sites — new
resolveEventSessionKey(sessionKey, mainKey)inrouting/session-key.tstranslates cron-run session keys to the agent's main session key before enqueueing. Wired into the four enqueue sites that emit exec/relay events from cron-context:bash-tools.exec-runtime.ts(foreground + background paths),acp-spawn-parent-stream.ts,gateway/server-node-events.ts,cli-runner/execute.ts(watchdog stall path).Custom
mainKeythreading throughscopedHeartbeatWakeOptions()— third parametermainKey?: stringensures cron wake targets align with the enqueue targets when an operator has configured a customsession.mainKey. Without this, the wake would default to the literal string"main"and miss the configured queue.Test additions —
routing/session-key.test.tsexercisesresolveEventSessionKeyand themainKeyparameter.infra/heartbeat-events-filter.test.tslocks inisExecCompletionEventregex shape (gateway/node approval path + backgrounded allowlist path; word-based slugs fromcreateSessionSlug(); case-insensitivity; common false-positive sentences).Out of scope (deferred)
WakeParamsSchematypedsessionKey— affectsgateway/protocol/schema/agent.ts,gateway/server-methods/cron.ts,cron/service/timer.ts. Contract change; warrants a focused PR.openclaw system event --session-key—cli/system-cli.tscurrently sends{mode, text}only. Belongs in the same follow-up PR as the schema change.target: "none"fallback to last delivery target) — dropped from this PR's scope per maintainer feedback in issuecomment-4322767117. Current main intentionally keepstarget="none"internal-only; revisit in a separate PR if ever.Real behavior proof
agent:<id>:cron:<job>:run:<uuid>, butheartbeat-runner.ts:861peeks the system-event queue under the agent's main key, so the events are never seen. The bug also has multi-config edges: a customsession.mainKeyshifts the agent-main queue name, andsession.scope: "global"makes the heartbeat drain the literal"global"queue instead — both stranded events too if not accounted for.tsxagainst the PR branch checkout.upstream/main, applied this PR's helper additions and call-site threading, then ran a small reproducer that imports the patched helpers and exercises the four scenarios that distinguish the fix from the bug:The new helpers are pure functions over session-key strings, so the behavior is reproducible directly from the patched source. Run from the PR branch checkout:
What this confirms against the four pre-fix failure modes I had to chase down across codex review rounds:
heartbeat-runner.ts:861peeks under agent-main → event invisibleagent:research:main✅session.mainKeyagent:<id>:mainqueue but heartbeat drainsagent:<id>:<custom>→ invisibleagent:research:primary✅session.scope: "global"agent:<id>:mainqueue but heartbeat drains"global"→ invisible"global"queue, wake carriesagentId: "research"✅Evidence after fix: in addition to the reproducer output captured above, the bug is reproducible by inspection of the unpatched
upstream/mainsource —scopedHeartbeatWakeOptionsdoes not handle cron-run keys (it just setssessionKey: parentSessionKeywithout remapping), andresolveEventSessionKeydoes not exist at all on main. Diff'd via:Observed result after fix: the four scenarios in the reproducer above each route to the correct heartbeat-target queue. Architecturally, this lines up the enqueue side with the runner's existing peek logic:
resolveHeartbeatSessionalready drains eitheragent:<id>:<mainKey>or the literal"global"queue depending onsession.scope; with this PR the enqueue side targets the same key so the events become visible to the heartbeat. The patched helpers also passnode scripts/check-deprecated-internal-config-api.mjs(the ambient-config-load architecture guard added since the original PR was filed); all four flaggedloadConfig()call sites now thread cfg throughExecToolDefaults/ProcessSession/ ACP relay params / cli-runner params instead.What was not tested: I did not stand up a live multi-channel installation with a cron-run that triggers an exec completion and verify end-to-end delivery to a real Telegram or Discord chat. The fix is on the enqueue side of the dispatch chain, where the heartbeat runner's peek logic was already correct — verifying string-level routing of the helper plus the architecture guard's pass were the load-bearing checks. End-to-end multi-channel behavior is best owned by integration coverage on the maintainer side once the helper API change is approved.
Test plan
resolveEventSessionKeyunit tests: cron-run keys remap, non-cron keys pass through, custommainKeyhonored, global-scope routes to"global"queue, explicit per-sender scope identical to defaultscopedHeartbeatWakeOptionsunit tests: cron-run keys produce remapped wake target, global-scope drops sessionKey but preservesagentId, agent-channel keys pass through unchangedisExecCompletionEventregex: matches both gateway/node-path "Exec finished" and registry-path "Exec completed/failed" with slug + hex IDs, case-insensitive; rejects free-form cron text containing the word "exec"upstream/mainpost-v2026.5.6 (preserved at git tagpreserve/50818-pre-rebase-2026-05-06)scripts/check-deprecated-internal-config-api.mjspasses locally — all 4 ambientloadConfig()calls replaced with passed cfg threaded through ExecToolDefaults / ProcessSession / ACP relay params / cli-runner params🤖 Generated with Claude Code