Skip to content

fix(heartbeat): remap cron-run session keys for exec/ACP/CLI events (refs #52305)#50818

Closed
Kaspre wants to merge 6 commits into
openclaw:mainfrom
Kaspre:fix/heartbeat-async-session-routing
Closed

fix(heartbeat): remap cron-run session keys for exec/ACP/CLI events (refs #52305)#50818
Kaspre wants to merge 6 commits into
openclaw:mainfrom
Kaspre:fix/heartbeat-async-session-routing

Conversation

@Kaspre

@Kaspre Kaspre commented Mar 20, 2026

Copy link
Copy Markdown
Contributor

Summary

Refs #52305 (cron-run remap slice). Related: #18237, #71213.

Update 2026-05-06: rebased onto current upstream/main (now post-v2026.5.6). Two production-code conflicts resolved (requestHeartbeat API surface, bootstrap-warning rename), routing/session-key.{ts,test.ts} and infra/heartbeat-events-filter.test.ts auto-merged cleanly. Codex review (gpt-5.5, base upstream/main): no actionable findings.

Scope note: clawsweeper's review on #52305 surfaced two adjacent gaps that this PR does not cover: (a) WakeParamsSchema does not type sessionKey (gateway/protocol/schema/agent.ts), and (b) openclaw system event CLI sends only {mode, text} (cli/system-cli.ts:54). Those belong to a separate wake-protocol/CLI surface PR — they're orthogonal to the enqueue-side cron-key remap fixed here. Tagging this PR Refs #52305 (not Closes) so the issue stays open until the wake-protocol gap is also closed.

Companion PR: #71213 (merged 2026-04-25) fixes a sibling bug in the prompt-construction layer — making sure exec event text is actually included in the prompt instead of phantom "system messages above" wording. That PR fixes step 3 of the pipeline (prompt content); this PR fixes step 1 (event visibility under cron-run keys). Production-code overlap with #71213: zero. Test-file overlap: heartbeat-events-filter.test.ts only.

Async exec completion events enqueued from cron-run sessions silently vanish instead of being relayed to the originating channel. The root cause is a session key mismatch: events are enqueued under agent:X:cron:Y:run:Z keys, but heartbeat-runner.ts peeks the system-event queue under the agent-main key — so the events are never seen.

Changes

  1. Cron→agent-main remapping at enqueue sites — new resolveEventSessionKey(sessionKey, mainKey) in routing/session-key.ts translates cron-run session keys to the agent's main session key before enqueueing. Wired into the four enqueue sites that emit exec/relay events from cron-context: bash-tools.exec-runtime.ts (foreground + background paths), acp-spawn-parent-stream.ts, gateway/server-node-events.ts, cli-runner/execute.ts (watchdog stall path).

  2. Custom mainKey threading through scopedHeartbeatWakeOptions() — third parameter mainKey?: string ensures cron wake targets align with the enqueue targets when an operator has configured a custom session.mainKey. Without this, the wake would default to the literal string "main" and miss the configured queue.

  3. Test additionsrouting/session-key.test.ts exercises resolveEventSessionKey and the mainKey parameter. infra/heartbeat-events-filter.test.ts locks in isExecCompletionEvent regex shape (gateway/node approval path + backgrounded allowlist path; word-based slugs from createSessionSlug(); case-insensitivity; common false-positive sentences).

Out of scope (deferred)

Real behavior proof

  • Behavior or issue addressed: Async exec / ACP / CLI-watchdog completion events emitted from cron-run sessions silently vanish instead of being relayed to the originating channel ([Bug]: async task completion reports can be lost because system event/wake is not reliably session-targeted #52305 cron-run-key remap slice). Events are enqueued under agent:<id>:cron:<job>:run:<uuid>, but heartbeat-runner.ts:861 peeks the system-event queue under the agent's main key, so the events are never seen. The bug also has multi-config edges: a custom session.mainKey shifts the agent-main queue name, and session.scope: "global" makes the heartbeat drain the literal "global" queue instead — both stranded events too if not accounted for.
  • Real environment tested: Local OpenClaw v2026.5.6 install on Linux 6.6.87.2 / WSL2 / Node v25.8.2; gateway running as systemd user service. The fix consists of pure helper functions over session-key strings, so the behavior is exercised directly via Node + tsx against the PR branch checkout.
  • Exact steps or command run after this patch: branched from current upstream/main, applied this PR's helper additions and call-site threading, then ran a small reproducer that imports the patched helpers and exercises the four scenarios that distinguish the fix from the bug:

The new helpers are pure functions over session-key strings, so the behavior is reproducible directly from the patched source. Run from the PR branch checkout:

$ cat > pr-50818-repro.ts << 'EOF'
import {
  resolveEventSessionKey,
  scopedHeartbeatWakeOptions,
} from "./src/routing/session-key.js";

const cronRunKey = "agent:research:cron:nightly-backup:run:abc-123";
const channelKey = "agent:research:telegram:dm:42";

console.log("=== resolveEventSessionKey (event-enqueue target queue) ===");
console.log("cron-run, default scope :", JSON.stringify(resolveEventSessionKey(cronRunKey)));
console.log("cron-run, custom mainKey:", JSON.stringify(resolveEventSessionKey(cronRunKey, "primary")));
console.log("cron-run, scope=global  :", JSON.stringify(resolveEventSessionKey(cronRunKey, undefined, "global")));
console.log("channel key (passthrough):", JSON.stringify(resolveEventSessionKey(channelKey)));

console.log("\n=== scopedHeartbeatWakeOptions (heartbeat wake target) ===");
console.log("cron-run, default        :", JSON.stringify(
  scopedHeartbeatWakeOptions(cronRunKey, { reason: "exec-event" })));
console.log("cron-run, custom mainKey :", JSON.stringify(
  scopedHeartbeatWakeOptions(cronRunKey, { reason: "exec-event" }, "primary")));
console.log("cron-run, scope=global   :", JSON.stringify(
  scopedHeartbeatWakeOptions(cronRunKey, { reason: "exec-event" }, undefined, "global")));
console.log("channel key (per-sender) :", JSON.stringify(
  scopedHeartbeatWakeOptions(channelKey, { reason: "exec-event" })));
EOF

$ node --import tsx pr-50818-repro.ts
=== resolveEventSessionKey (event-enqueue target queue) ===
cron-run, default scope : "agent:research:main"
cron-run, custom mainKey: "agent:research:primary"
cron-run, scope=global  : "global"
channel key (passthrough): "agent:research:telegram:dm:42"

=== scopedHeartbeatWakeOptions (heartbeat wake target) ===
cron-run, default        : {"reason":"exec-event","sessionKey":"agent:research:main"}
cron-run, custom mainKey : {"reason":"exec-event","sessionKey":"agent:research:primary"}
cron-run, scope=global   : {"reason":"exec-event","agentId":"research"}
channel key (per-sender) : {"reason":"exec-event","sessionKey":"agent:research:telegram:dm:42"}

What this confirms against the four pre-fix failure modes I had to chase down across codex review rounds:

Scenario Pre-fix behavior Post-fix output above
cron-run key, default config event enqueued under cron-run key; heartbeat-runner.ts:861 peeks under agent-main → event invisible event enqueued under agent:research:main
cron-run key, custom session.mainKey event lands in agent:<id>:main queue but heartbeat drains agent:<id>:<custom> → invisible event enqueued under agent:research:primary
cron-run key, session.scope: "global" event lands in agent:<id>:main queue but heartbeat drains "global" → invisible event enqueued under "global" queue, wake carries agentId: "research"
Non-cron channel key already worked correctly preserved unchanged ✅
  • Evidence after fix: in addition to the reproducer output captured above, the bug is reproducible by inspection of the unpatched upstream/main source — scopedHeartbeatWakeOptions does not handle cron-run keys (it just sets sessionKey: parentSessionKey without remapping), and resolveEventSessionKey does not exist at all on main. Diff'd via:

    $ git show upstream/main:src/routing/session-key.ts | grep -A 5 "scopedHeartbeatWakeOptions"
    export function scopedHeartbeatWakeOptions<T extends object>(
      sessionKey: string,
      wakeOptions: T,
    ): T | (T & { sessionKey: string }) {
      return parseAgentSessionKey(sessionKey) ? { ...wakeOptions, sessionKey } : wakeOptions;
    }
    # (resolveEventSessionKey is not exported on main — added by this PR.)
  • Observed result after fix: the four scenarios in the reproducer above each route to the correct heartbeat-target queue. Architecturally, this lines up the enqueue side with the runner's existing peek logic: resolveHeartbeatSession already drains either agent:<id>:<mainKey> or the literal "global" queue depending on session.scope; with this PR the enqueue side targets the same key so the events become visible to the heartbeat. The patched helpers also pass node scripts/check-deprecated-internal-config-api.mjs (the ambient-config-load architecture guard added since the original PR was filed); all four flagged loadConfig() call sites now thread cfg through ExecToolDefaults / ProcessSession / ACP relay params / cli-runner params instead.

  • What was not tested: I did not stand up a live multi-channel installation with a cron-run that triggers an exec completion and verify end-to-end delivery to a real Telegram or Discord chat. The fix is on the enqueue side of the dispatch chain, where the heartbeat runner's peek logic was already correct — verifying string-level routing of the helper plus the architecture guard's pass were the load-bearing checks. End-to-end multi-channel behavior is best owned by integration coverage on the maintainer side once the helper API change is approved.

Test plan

  • resolveEventSessionKey unit tests: cron-run keys remap, non-cron keys pass through, custom mainKey honored, global-scope routes to "global" queue, explicit per-sender scope identical to default
  • scopedHeartbeatWakeOptions unit tests: cron-run keys produce remapped wake target, global-scope drops sessionKey but preserves agentId, agent-channel keys pass through unchanged
  • isExecCompletionEvent regex: matches both gateway/node-path "Exec finished" and registry-path "Exec completed/failed" with slug + hex IDs, case-insensitive; rejects free-form cron text containing the word "exec"
  • Rebased onto current upstream/main post-v2026.5.6 (preserved at git tag preserve/50818-pre-rebase-2026-05-06)
  • Architecture guard scripts/check-deprecated-internal-config-api.mjs passes locally — all 4 ambient loadConfig() calls replaced with passed cfg threaded through ExecToolDefaults / ProcessSession / ACP relay params / cli-runner params
  • Codex review clean across 5 rounds (each round caught a distinct global-scope / multi-agent edge case; final round: no actionable findings)
  • Real behavior proof attached above
  • CI green

🤖 Generated with Claude Code

@greptile-apps

greptile-apps Bot commented Mar 20, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a long-standing async context routing bug by propagating sessionKey through the exec/hooks heartbeat wake paths, ensuring exec completion responses reliably route back to the originating session and channel without requiring any user configuration changes.

Key changes:

  • resolveEventSessionKey — new helper that remaps cron session keys to the agent's main session key before event enqueue, so the heartbeat runner can locate events stored under the correct key.
  • scopedHeartbeatWakeOptions — now also strips sessionKey for cron sessions (in addition to non-agent keys), maintaining the legacy unscoped wake behavior for those paths.
  • forceLastTargetWhenNone — new flag on resolveHeartbeatDeliveryTarget that forces exec completion events to the session's last delivery channel even when heartbeat.target = none, bypassing the silent-discard path that caused the original regression.
  • isExecCompletionEvent — expanded with a start-anchored regex (/^exec (?:completed|failed|killed) \(/) to cover the maybeNotifyOnExit backgrounded command path; addresses the false-positive concern raised in earlier review by anchoring patterns instead of doing substring matches.
  • resolveHeartbeatReasonKind — maps exec:<id>:exit reasons to the "exec-event" kind, feeding correctly into the isExecEventReason preflight flag.
  • The JSDoc on resolveEventSessionKey clearly documents the known limitation: custom session.mainKey setups are not supported for cron event remapping due to lack of config access in that call path. No action required, but worth noting for future config-aware refactors.

Confidence Score: 5/5

  • Safe to merge — targeted bug fix with comprehensive test coverage and no regressions identified.
  • All four change pillars (session key propagation, wake scoping, delivery target override, event classification) are implemented correctly and independently tested. Previous review concerns about false-positive substring matching have been fully addressed with start-anchored patterns. The one theoretical edge case (invalid rawTarget string with a configured heartbeat.to) is benign in practice. Test suite covers both the happy path and the false-positive guard cases added based on prior feedback.
  • No files require special attention.

Reviews (2): Last reviewed commit: "fix: anchor exec completion regex to sta..." | Re-trigger Greptile

Comment thread src/infra/outbound/targets.ts Outdated
Comment thread src/infra/heartbeat-events-filter.ts

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c21582983d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/infra/heartbeat-events-filter.ts Outdated
Comment thread src/infra/heartbeat-runner.ts Outdated
@Kaspre

Kaspre commented Mar 20, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the review feedback. Pushed a commit addressing both Greptile suggestions:

  • isExecCompletionEvent: Anchored to exec (?:completed|failed|killed) \( regex to avoid false positives in free-form cron text
  • targets.ts: Collapsed redundant double-check into single if/else

Regarding @chatgpt-codex-connector's concern about forceLastTargetWhenNone firing for non-completion exec events: in practice, requestHeartbeatNow({ reason: "exec-event" }) is only called from completion paths (server-node-events.ts for exec.finished, and maybeNotifyOnExit for backgrounded commands). The exec.denied/exec.running events use enqueueSystemEvent without triggering a heartbeat wake, so isExecEventReason won't be true for those. The current gating is safe, but open to tightening it if maintainers prefer belt-and-suspenders.

@chatgpt-codex-connector

Copy link
Copy Markdown

To use Codex here, create an environment for this repo.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ca9df70e07

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/infra/outbound/targets.ts Outdated
@Kaspre

Kaspre commented Mar 22, 2026

Copy link
Copy Markdown
Contributor Author

Good catch on the stale heartbeat.to/heartbeat.accountId edge case. Pushed a fix:

  • Added a forcedToLast flag that's true when forceLastTargetWhenNone activates on a target: "none" config
  • When active, explicitTo passes undefined to resolveSessionDeliveryTarget instead of heartbeat.to
  • heartbeatAccountId is also set to undefined under the same condition
  • sessionChatTypeHint now correctly normalizes chat type when forced (treats it as if heartbeat.to is absent)

No config mutation — the original heartbeat object is untouched. The flag only affects downstream resolution for that call.

@briannewman

Copy link
Copy Markdown
Contributor

Heads up — scopedHeartbeatWakeOptions in src/routing/session-key.ts passes through sessionKey for any valid agent session key, including cron run sessions. This means exec completions inside cron jobs (e.g. a gws CLI call in an isolated cron task) fire a heartbeat wake scoped to the cron session instead of falling back to main.

In practice this injects the heartbeat prompt into an in-flight cron session, confusing the model mid-task. We've hit this multiple times — an isolated overnight-thinking cron job running sequential gws gmail exec calls gets heartbeat wakes injected after each completion, causing the model to re-run the task or narrate its internal steps into the delivery channel.

The fix would be a one-line guard in scopedHeartbeatWakeOptions:

export function scopedHeartbeatWakeOptions<T extends object>(
  sessionKey: string,
  wakeOptions: T,
): T | (T & { sessionKey: string }) {
  if (!parseAgentSessionKey(sessionKey)) return wakeOptions;
  if (isCronSessionKey(sessionKey)) return wakeOptions;
  return { ...wakeOptions, sessionKey };
}

isCronSessionKey is already imported in that file.

@Kaspre

Kaspre commented Mar 22, 2026

Copy link
Copy Markdown
Contributor Author

Good catch @briannewman — applied in a3d2cfc. Added the isCronSessionKey guard to scopedHeartbeatWakeOptions so cron sessions skip wake scoping entirely.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a3d2cfce0d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/infra/outbound/targets.ts Outdated
Comment thread src/routing/session-key.ts Outdated
@openclaw-barnacle openclaw-barnacle Bot added the agents Agent runtime and tooling label Mar 22, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 869950add9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/routing/session-key.ts Outdated
Comment thread src/agents/bash-tools.exec-runtime.ts Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2ac7fcd7cb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gateway/server/hooks.ts Outdated
Comment thread src/routing/session-key.ts Outdated
@Kaspre Kaspre force-pushed the fix/heartbeat-async-session-routing branch from 2ac7fcd to 347ca05 Compare March 22, 2026 16:33
@Kaspre

Kaspre commented Mar 22, 2026

Copy link
Copy Markdown
Contributor Author

Squashed the incremental review-driven commits into a single clean commit (347ca05) for easier review. Here's what's in it:

1. sessionKey propagation — the original fix. requestHeartbeatNow calls in hooks and node-events now carry the originating session key via scopedHeartbeatWakeOptions.

2. Exec completion detectionisExecCompletionEvent anchored to the parenthesised format (Exec completed/failed/killed () to avoid false positives in free-form cron text. Regex pre-compiled.

3. Target routing for exec wakes — when heartbeat.target is "none" or omitted, exec completion wakes force fallback to the session's last delivery target. Stale heartbeat.to and heartbeat.accountId are cleared when forced (forcedToLast covers both explicit "none" and undefined).

4. Cron session isolation (from @briannewman's feedback) — scopedHeartbeatWakeOptions strips sessionKey for cron sessions to prevent heartbeat prompt injection. To avoid orphaning exec completion events, resolveEventSessionKey remaps system events from cron sessions to the agent's main session key. Applied to all 4 callers: bash-tools.exec-runtime.ts, server-node-events.ts, acp-spawn-parent-stream.ts, cli-runner.ts.

5. Test coverage — new tests for scopedHeartbeatWakeOptions (cron/agent/non-agent keys), resolveEventSessionKey (remapping + passthrough), and exec completion false-positive resistance on free-form cron text. 136 tests passing.

Known limitation: resolveEventSessionKey uses the default main key ("main"). Setups with a custom session.mainKey are not supported for cron event remapping — callers in the exec path don't have config access. Documented in code.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 347ca05493

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/infra/heartbeat-events-filter.ts Outdated
@Kaspre

Kaspre commented Mar 25, 2026

Copy link
Copy Markdown
Contributor Author

Branch rebased onto latest main, resolving merge conflicts and stale CI.

Re: @greptile-apps suggestion to collapse the target === "none" guards into a single if/else — the current early-return pattern is intentional and reads more clearly than the nested alternative. The two guards handle distinct cases (no-force → early return with no-delivery target; force → fall through to "last") and the early return makes the control flow explicit. No functional change needed here.

Requesting re-review. cc @greptile-apps @chatgpt-codex-connector

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1dfa292575

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/routing/session-key.ts Outdated
@Kaspre

Kaspre commented Mar 25, 2026

Copy link
Copy Markdown
Contributor Author

Fixed the P1 issue raised by @chatgpt-codex-connector:

Unscoped cron wakes — Cron session keys now resolve to the originating agent's main session key (e.g., agent:main:cron:backup:run:abcagent:main:main). This prevents wake requests from fanning out to all agents while avoiding reference to the ephemeral cron session key.

This is consistent with how resolveEventSessionKey already handles cron remapping (same file, line 58-65).

Test updated to verify the new scoping behavior. Commit: 3d854506d3.

@Kaspre Kaspre force-pushed the fix/heartbeat-async-session-routing branch from 6a4bca4 to d5af570 Compare May 7, 2026 18:08
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026
Kaspre added a commit to Kaspre/openclaw that referenced this pull request May 8, 2026
…xt loss

Exec completion events enqueued under cron or channel-specific session
keys were invisible to the heartbeat runner, which always checked the
agent's main session key. This caused exec results to silently vanish
instead of being relayed back to the originating channel.

1. Propagate sessionKey through scopedHeartbeatWakeOptions in bash
   exec, ACP spawn, gateway/node exec, and CLI watchdog paths
2. Add resolveEventSessionKey to remap cron session keys to the
   originating agent's main session key for event enqueueing

Scope was previously broader. Dropped per maintainer review feedback
on openclaw#50818 (issuecomment-4322767117):

- The forceLastTargetWhenNone override for resolveHeartbeatDeliveryTarget,
  which routed exec completions to the session's last target even when
  heartbeat.target was explicitly "none". Current main and its regression
  tests intentionally keep target="none" internal-only, so this routing
  change is left for a separate focused PR if it is ever revisited.
- The openclaw#39978 hook-cancel handling has already shipped via upstream PR
  openclaw#70258, so it is no longer part of this PR's scope.

The PR now closes openclaw#52305 only.

Closes openclaw#52305. Related: openclaw#18237.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kaspre added a commit to Kaspre/openclaw that referenced this pull request May 8, 2026
Codex review on PR openclaw#50818 [P3] flagged that this user-facing heartbeat/
async-exec routing fix was missing the required Unreleased/Fixes entry.
Add a single-line entry describing the cron-run session-key remap at the
four enqueue sites (bash exec, ACP, gateway node-event, CLI watchdog)
plus the global-scope branch added in the follow-up commits.

Refs openclaw#50818.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Kaspre Kaspre force-pushed the fix/heartbeat-async-session-routing branch from d5af570 to 239be03 Compare May 8, 2026 00:49
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 8, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 8, 2026
@Kaspre Kaspre force-pushed the fix/heartbeat-async-session-routing branch from 239be03 to 6481b9b Compare May 9, 2026 11:19
Kaspre added a commit to Kaspre/openclaw that referenced this pull request May 9, 2026
…xt loss

Exec completion events enqueued under cron or channel-specific session
keys were invisible to the heartbeat runner, which always checked the
agent's main session key. This caused exec results to silently vanish
instead of being relayed back to the originating channel.

1. Propagate sessionKey through scopedHeartbeatWakeOptions in bash
   exec, ACP spawn, gateway/node exec, and CLI watchdog paths
2. Add resolveEventSessionKey to remap cron session keys to the
   originating agent's main session key for event enqueueing

Scope was previously broader. Dropped per maintainer review feedback
on openclaw#50818 (issuecomment-4322767117):

- The forceLastTargetWhenNone override for resolveHeartbeatDeliveryTarget,
  which routed exec completions to the session's last target even when
  heartbeat.target was explicitly "none". Current main and its regression
  tests intentionally keep target="none" internal-only, so this routing
  change is left for a separate focused PR if it is ever revisited.
- The openclaw#39978 hook-cancel handling has already shipped via upstream PR
  openclaw#70258, so it is no longer part of this PR's scope.

The PR now closes openclaw#52305 only.

Closes openclaw#52305. Related: openclaw#18237.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kaspre added a commit to Kaspre/openclaw that referenced this pull request May 9, 2026
Codex review on PR openclaw#50818 [P3] flagged that this user-facing heartbeat/
async-exec routing fix was missing the required Unreleased/Fixes entry.
Add a single-line entry describing the cron-run session-key remap at the
four enqueue sites (bash exec, ACP, gateway node-event, CLI watchdog)
plus the global-scope branch added in the follow-up commits.

Refs openclaw#50818.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Kaspre

Kaspre commented May 9, 2026

Copy link
Copy Markdown
Contributor Author

Rebase onto upstream/main (2026-05-09)

Resolved conflict in src/routing/session-key.ts: upstream cleaned up unused top-level imports (type ParsedAgentSessionKey, DEFAULT_ACCOUNT_ID); kept isCronRunSessionKey which the PR's new function bodies require.

@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 9, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 9, 2026
Kaspre and others added 5 commits May 10, 2026 00:16
…xt loss

Exec completion events enqueued under cron or channel-specific session
keys were invisible to the heartbeat runner, which always checked the
agent's main session key. This caused exec results to silently vanish
instead of being relayed back to the originating channel.

1. Propagate sessionKey through scopedHeartbeatWakeOptions in bash
   exec, ACP spawn, gateway/node exec, and CLI watchdog paths
2. Add resolveEventSessionKey to remap cron session keys to the
   originating agent's main session key for event enqueueing

Scope was previously broader. Dropped per maintainer review feedback
on openclaw#50818 (issuecomment-4322767117):

- The forceLastTargetWhenNone override for resolveHeartbeatDeliveryTarget,
  which routed exec completions to the session's last target even when
  heartbeat.target was explicitly "none". Current main and its regression
  tests intentionally keep target="none" internal-only, so this routing
  change is left for a separate focused PR if it is ever revisited.
- The openclaw#39978 hook-cancel handling has already shipped via upstream PR
  openclaw#70258, so it is no longer part of this PR's scope.

The PR now closes openclaw#52305 only.

Closes openclaw#52305. Related: openclaw#18237.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… config-boundary guard

Two related changes, both surfaced by upstream-main checks on the rebased PR:

1. Architecture guard: replace ambient loadConfig() calls in agent-runtime
   code with a passed cfg/mainKey threaded through the same layer the rest
   of the agent's session config already lives at. ExecToolDefaults grows
   `mainKey` + `sessionScope`; runExecProcess snapshots them onto the
   ProcessSession; ACP startAcpSpawnParentStreamRelay accepts them as
   params; cli-runner watchdog reads from params.config; createOpenClawCodingTools
   threads them from options.config so normal agent tool runs are covered;
   bash-command / commands-diagnostics / commands-export-trajectory pass
   them through from cfg. The four sites flagged by
   scripts/check-deprecated-internal-config-api.mjs are gone.

2. Global-scope correctness: when `cfg.session.scope === "global"`,
   resolveHeartbeatSession drains the literal "global" queue, not
   agent-main. resolveEventSessionKey and scopedHeartbeatWakeOptions both
   take an optional `scope` parameter and route accordingly:
   - resolveEventSessionKey returns "global" for cron-run keys under
     global scope (instead of agent:<id>:main, which would strand events).
   - scopedHeartbeatWakeOptions drops the targeted sessionKey but
     preserves `agentId: parsed.agentId` so multi-agent global-scope
     setups still wake the originating agent's heartbeat.

Tests in routing/session-key.test.ts cover the new global-scope branches
and the per-sender pass-through. Existing fixture agent IDs in this file
were genericized at the same time (einstein → ops, captain → primary).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex review on PR openclaw#50818 [P3] flagged that this user-facing heartbeat/
async-exec routing fix was missing the required Unreleased/Fixes entry.
Add a single-line entry describing the cron-run session-key remap at the
four enqueue sites (bash exec, ACP, gateway node-event, CLI watchdog)
plus the global-scope branch added in the follow-up commits.

Refs openclaw#50818.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Kaspre Kaspre force-pushed the fix/heartbeat-async-session-routing branch from 6481b9b to c766369 Compare May 10, 2026 04:21
Kaspre added a commit to Kaspre/openclaw that referenced this pull request May 10, 2026
Adds an optional sessionKey to the WakeParamsSchema and threads it through
the gateway wake handler, CronService.wake(), and the underlying timer.wake()
ops so callers can target a specific session for async-task completion
relays instead of always hitting the agent's main session.

Also adds --session-key to `openclaw system event`.

The schema rejects empty/non-string sessionKey at the gateway boundary;
mismatched session keys (a key that does not belong to the resolving agent)
fall back to the agent's main session inside resolveCronSessionKey, which
is the existing safety path.

Refs openclaw#52305 (companion to PR openclaw#50818, which closes the related cron-run
remap slice at internal enqueue sites). Doesn't depend on openclaw#50818.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 10, 2026

Kaspre commented May 10, 2026

Copy link
Copy Markdown
Contributor Author

Updated this PR for current upstream/main and pushed c766369ff0.

Changes since the earlier revision:

  • Rebased onto latest main.
  • No functional code changes were needed after the rebase.
  • Applied the current formatter to src/routing/session-key.ts.

Validation:

  • pnpm test src/routing/session-key.test.ts src/infra/heartbeat-events-filter.test.ts passed.
  • node scripts/check-deprecated-api-usage.mjs passed.
  • node scripts/check-no-runtime-action-load-config.mjs passed.
  • pnpm exec oxfmt --check --threads=1 ... on the changed files passed.
  • git diff --check passed.

Note: standalone oxlint over this PR's changed files still reports the existing __testing export naming issue in src/agents/pi-tools.ts and src/agents/bash-tools.exec.ts; those exports are already present on current upstream/main, so I left them untouched.

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 10, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 10, 2026

Kaspre commented May 10, 2026

Copy link
Copy Markdown
Contributor Author

Pushed follow-up commit 735f2c3c7c (fix(heartbeat): strengthen cron-run routing proof).

Local verification:

  • pnpm test src/routing/session-key.test.ts src/infra/heartbeat-events-filter.test.ts src/agents/bash-tools.exec-runtime.test.ts src/cron/session-reaper.test.ts src/agents/acp-spawn-parent-stream.test.ts passed.
  • node scripts/check-deprecated-api-usage.mjs passed.
  • node scripts/check-no-runtime-action-load-config.mjs passed.
  • pnpm exec oxfmt --check --threads=1 ... over the changed files passed.
  • git diff --check passed.
  • Required local Claude review completed after the final follow-up; it found no regressions and marked the cron-run descendant pruning and ACP relay routing asks addressed.

Notes:

  • Standalone oxlint over the touched files still reports only the existing upstream export const __testing no-underscore-dangle errors in src/agents/pi-tools.ts and src/agents/bash-tools.exec.ts.
  • I did not do live gateway/channel heartbeat proof from this environment, so that remains a runtime validation gap.

@Kaspre

Kaspre commented May 10, 2026

Copy link
Copy Markdown
Contributor Author

Closing in favor of a clean replacement PR from current main: #80214.\n\nThe replacement keeps the current narrowed fix for #52305, drops the stale review history, and presents the change as one clean commit with current validation.

@Kaspre

Kaspre commented May 10, 2026

Copy link
Copy Markdown
Contributor Author

Superseded by #80214.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling gateway Gateway runtime proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants