Skip to content

feat(wake): expose typed sessionKey on wake protocol + system event CLI (refs #52305)#78687

Merged
steipete merged 10 commits into
openclaw:mainfrom
Kaspre:fix/wake-protocol-session-key
May 11, 2026
Merged

feat(wake): expose typed sessionKey on wake protocol + system event CLI (refs #52305)#78687
steipete merged 10 commits into
openclaw:mainfrom
Kaspre:fix/wake-protocol-session-key

Conversation

@Kaspre

@Kaspre Kaspre commented May 7, 2026

Copy link
Copy Markdown
Contributor

Summary

Refs #52305. Companion to #80214 (cron-run remap at internal enqueue sites) and #71213 (prompt content layer, MERGED). Together with #80214, this PR is intended to close #52305: #78687 covers the external wake/system-event sessionKey surface, while #80214 covers internal cron-run exec/ACP/node-event/watchdog completion routing. The two PRs touch disjoint surfaces and can be reviewed independently.

Adds an optional sessionKey to the wake protocol so callers can target a specific session for system events / async-task completion relays instead of always hitting the agent's main session. Without this, the CLI surface for openclaw system event was main-session-only, and WakeParamsSchema left sessionKey opaque, which was the remaining external-callback gap clawsweeper called out on #52305.

Changes

  1. SchemaWakeParamsSchema (src/gateway/protocol/schema/agent.ts): add sessionKey: Type.Optional(NonEmptyString). Empty / non-string is rejected at the validator boundary.
  2. Wake handler (src/gateway/server-methods/cron.ts:179): forward sessionKey to context.cron.wake({...}) when provided.
  3. CronService contract + impl (src/cron/service-contract.ts, src/cron/service.ts, src/cron/service/ops.ts, src/cron/service/timer.ts): thread optional sessionKey through wake()wakeNow() → underlying wake(state, opts) ops. Whitespace-only keys are treated as omitted.
  4. Gateway cron adapter (src/gateway/server-cron.ts): when only sessionKey is supplied (no agentId), enqueueSystemEvent now derives agentId from the session key — mirroring what requestHeartbeat's resolveCronWakeTarget already does. Without this, resolveCronSessionKey would treat a non-default agent's key as foreign and silently reroute to the default agent's main session, while heartbeat woke the correct session — an asymmetric bug. Caught by codex in P2 review on this branch and fixed before opening.
  5. CLI (src/cli/system-cli.ts): add --session-key <sessionKey> option to openclaw system event. Whitespace-only values are treated as omitted to match server-side semantics.
  6. Docs (docs/cli/system.md): describe the new flag, the safety fall-back (foreign-agent keys → agent's main session), and a callout for the next-heartbeat + --session-key timing exception (collapses to immediate targeted wake; addresses [P3] from clawsweeper review).

Out of scope

  • The internal enqueue-site cron-run-key remap (cron:run:uuid → agent-main) is handled by fix(heartbeat): remap cron-run exec events to session keys #80214. This PR is only about the external contract surface; it doesn't touch exec-runtime / ACP / CLI watchdog enqueue sites.
  • Adding sessionKey to other RPC surfaces. Only wake is in scope.
  • Deeper schema-level shape validation (parseAgentSessionKey-style). Defer until there's evidence of misuse — resolveCronSessionKey's existing cross-agent fall-back covers the security concern.

Real behavior proof

  • Behavior or issue addressed: external callers (mobile clients, the openclaw system event CLI, plugin SDKs) need a way to target a specific agent session for system-event/wake delivery. The pre-PR WakeParamsSchema typed only {mode, text} with additionalProperties: true, so sessionKey could be smuggled in but was opaque and unvalidated. The CLI's system event had no way at all to set it. Closing this gap means async-task completion relays from external systems can land in the originating agent's session instead of always hitting the agent's main session.

  • Real environment tested: Local OpenClaw checkout on Linux 6.6.87.2 / WSL2 / Node v25.8.2, branched from current upstream/main at 95a1c91531 (post-v2026.5.6, includes the canvas-plugin refactor that consolidated apps/macos/Sources/OpenClawProtocol/GatewayModels.swift into apps/shared/OpenClawKit/). Gateway runs as systemd user service.

  • Exact steps or command run after this patch: branched from current upstream/main, applied this PR's schema + CLI + wake-handler + cron-adapter changes, then ran the routing helpers directly via node --import tsx against the patched source to confirm session-key handling in the wake-routing layer for the three key shapes the cron adapter has to deal with (agent-prefixed channel, agent-prefixed cron-run, relative).

  • Evidence after fix:

    $ openclaw status | grep -E "Update|Gateway service"
    │ Update               │ pnpm · up to date · npm latest 2026.5.6                                  │
    │ Gateway service      │ systemd user installed · enabled · running (pid 270001, state active)    │
    
    $ cat > /tmp/wake-routing-repro.ts << 'EOF'
    // Reproduces wake-routing decisions on the patched source using only
    // the helpers actually exported from src/routing/session-key.ts.
    import {
      scopedHeartbeatWakeOptions,
      resolveAgentIdFromSessionKey,
      classifySessionKeyShape,
    } from "/home/captain/openclaw-source/src/routing/session-key.js";
    
    const cases = [
      { label: "agent-prefixed channel  ", key: "agent:research:telegram:dm:42" },
      { label: "agent-prefixed cron-run ", key: "agent:research:cron:nightly:run:abc-123" },
      { label: "relative key            ", key: "discord:channel:ops" },
    ];
    for (const { label, key } of cases) {
      console.log(label, "shape=" + classifySessionKeyShape(key),
        "agentId=" + JSON.stringify(resolveAgentIdFromSessionKey(key)));
      console.log(label, "wake-options:",
        JSON.stringify(scopedHeartbeatWakeOptions(key, { reason: "exec-event" })));
    }
    EOF
    
    $ node --import tsx /tmp/wake-routing-repro.ts
    agent-prefixed channel   shape=agent agentId="research"
    agent-prefixed channel   wake-options: {"reason":"exec-event","sessionKey":"agent:research:telegram:dm:42"}
    agent-prefixed cron-run  shape=agent agentId="research"
    agent-prefixed cron-run  wake-options: {"reason":"exec-event","sessionKey":"agent:research:cron:nightly:run:abc-123"}
    relative key             shape=legacy_or_alias agentId="main"
    relative key             wake-options: {"reason":"exec-event"}

    Confirms (a) classifySessionKeyShape correctly tags agent-prefixed vs relative keys; (b) resolveAgentIdFromSessionKey extracts the agent id (research) for agent-prefixed shapes and falls back to the literal "main" for relative keys; (c) scopedHeartbeatWakeOptions passes agent-prefixed keys through to the heartbeat target, but drops the sessionKey (returns {reason} only) for relative keys — because the routing helper alone can't disambiguate which configured agent owns a relative key.

    That (c) observation is the asymmetry this PR's cron-adapter change in src/gateway/server-cron.ts corrects: resolveCronWakeTarget already called resolveAgentIdFromSessionKey and routed wakes to the resolving agent's queue. But enqueueSystemEvent in the same adapter went through resolveCronSessionKey, which on a multi-agent deployment where main is not the configured-default would treat a foreign agent's key as off-target and silently re-route the event to the default agent's main session. Wake fired on agent A; event landed in agent B. After this PR, both enqueueSystemEvent and resolveCronWakeTarget derive agentId the same way (via parseAgentSessionKey + resolveAgentIdFromSessionKey, then passed to the existing resolveCronAgent helper), so both sides agree on the target.

  • Integration test (cbded2f): src/gateway/server-cron.test.ts now includes a test (derives agentId symmetrically for enqueue and wake when only an agent-prefixed sessionKey is supplied) that constructs the cron-adapter against a multi-agent config (primary default + ops non-default), drives cronDeps.enqueueSystemEvent and cronDeps.requestHeartbeat with the same foreign-agent session key (agent:ops:cron:nightly:run:abc-123), and asserts both observed outputs target agent:ops:... rather than falling back to primary. Without this PR's adapter change the assert on the enqueue side would fail because pre-PR's enqueueSystemEvent adapter ran resolveCronAgent(undefined) and rerouted to primary's main queue.

    $ pnpm vitest run src/gateway/server-cron.test.ts -t "symmetric"
     Test Files  1 passed (1)
          Tests  1 passed | 12 skipped (13)
       Start at  19:19:41
       Duration  28.26s (transform 20.09s, setup 1.89s, import 25.98s, tests 26ms, environment 0ms)
    
  • Live gateway smoke: built the gateway from the rebased branch and ran it on a separate test instance (OPENCLAW_HOME=/tmp/oc-test-home, port 18790, OPENCLAW_SKIP_CHANNELS=1) with a multi-agent config — primary (default) + ops (non-default), shared ollama-cloud/kimi-k2.5 model, agents.defaults.heartbeat.every: "1h" so heartbeats are enabled for every agent (otherwise isHeartbeatEnabledForAgent only returns true for the configured-default agent). Then ran two system event calls back-to-back via the patched CLI:

    $ openclaw system event --url ws://127.0.0.1:18790 --token <redacted> \
        --session-key agent:ops:cron:demo:run:CASE_A \
        --text "case-A-marker" --mode now --json
    { "ok": true }
    
    $ openclaw system event --url ws://127.0.0.1:18790 --token <redacted> \
        --session-key agent:primary:cron:demo:run:CASE_B \
        --text "case-B-marker" --mode now --json
    { "ok": true }
    

    Gateway dispatch log on the test instance:

    20:33:39 [diagnostic]      lane enqueue: lane=session:agent:ops:cron:demo:run:case_a queueSize=1
    20:33:39 [diagnostic]      lane dequeue: lane=session:agent:ops:cron:demo:run:case_a waitMs=3 queueSize=0
    20:33:51 [agent/embedded]  pre-prompt: sessionKey=agent:ops:cron:demo:run:case_a … provider=ollama-cloud/kimi-k2.5
                               sessionFile=/tmp/oc-test-home/.openclaw/agents/ops/sessions/92947670-…jsonl
    20:34:09 [diagnostic]      lane task done: lane=session:agent:ops:cron:demo:run:case_a durationMs=30027
    
    20:34:18 [diagnostic]      lane enqueue: lane=session:agent:primary:cron:demo:run:case_b queueSize=1
    20:34:18 [diagnostic]      lane dequeue: lane=session:agent:primary:cron:demo:run:case_b waitMs=3 queueSize=0
    20:34:24 [agent/embedded]  pre-prompt: sessionKey=agent:primary:cron:demo:run:case_b … provider=ollama-cloud/kimi-k2.5
                               sessionFile=/tmp/oc-test-home/.openclaw/agents/primary/sessions/9689b924-…jsonl
    20:34:38 [diagnostic]      lane task done: lane=session:agent:primary:cron:demo:run:case_b durationMs=19782
    

    Two pieces of evidence in each block:

    1. Lane name — heartbeat-runner's command-queue lane is keyed by the resolved agent's session: session:agent:ops:… for the foreign-agent call, session:agent:primary:… for the default-agent call. Pre-PR, CASE_A's lane would have read session:agent:primary:cron:demo:run:case_a because the enqueue side fell back to resolveCronAgent(undefined) and rerouted to the configured default; the symmetric agentId derivation this PR adds is what makes CASE_A land at ops.
    2. Session file path — independent filesystem confirmation: agents/ops/sessions/…jsonl for CASE_A vs agents/primary/sessions/…jsonl for CASE_B. The event was actually filed under the originating agent's on-disk session store, not the configured-default's.

    Both runs completed successfully (lane task done after ~20–30s) against the live ollama-cloud/kimi-k2.5 provider, so this is end-to-end behavior including model dispatch, not just routing.

  • Observed result after fix:

    • WakeParamsSchema now types sessionKey: NonEmptyString (optional). Empty string / non-string is rejected at the gateway boundary before reaching the cron service.
    • openclaw system event --session-key <key> --text foo forwards the key in the JSON-RPC payload (verified in src/cli/system-cli.test.ts); whitespace-only values are treated as omitted.
    • wake({mode: "now", text, sessionKey}) and wake({mode: "next-heartbeat", text, sessionKey}) both fire a targeted immediate heartbeat when sessionKey is supplied — codex review surfaced (round 5 of this PR's iteration) that an event-intent wake gets deferred as not-due by the heartbeat-runner and isn't retried, so an immediate wake is the only reliable path. Documented in the wake-fn comment block AND in docs/cli/system.md as a public timing-exception callout (addresses clawsweeper [P3]).
    • The gateway cron adapter routes both agent-prefixed and relative session keys through the same resolveCronAgent resolution for enqueue and wake, so they land in/wake the same target on multi-agent deployments where main exists but isn't the configured default. Pre-PR, resolveCronWakeTarget always derived agentId from resolveAgentIdFromSessionKey while enqueueSystemEvent did not, so wake could target agent:<resolving>:... while enqueue went to agent:<configured-default>:.... That asymmetry is now fixed.
    • Generated WakeParams Swift model (apps/shared/OpenClawKit/Sources/OpenClawProtocol/GatewayModels.swift) updated to expose the new field. The standalone apps/macos/Sources/OpenClawProtocol/GatewayModels.swift was deleted by upstream's canvas-plugin refactor (commit 330ba1fa31); after rebase only the shared one survives. pnpm protocol:check runs protocol:gen + protocol:gen:swift + git diff --exit-code — verified locally clean (exit 0) on the rebased branch with @openclaw/fs-safe@c7ccb99d installed.
  • What was not tested:

    • This change is channel-agnostic — sessionKey is a typed field on the JSON-RPC schema, so any caller (mobile client, plugin SDK, channel webhook, native chat, the openclaw system event CLI itself) that targets a specific session benefits the same way. The live smoke above exercises the CLI surface; other callers share the same gateway/cron-adapter code path.

Test plan

  • CLI test (src/cli/system-cli.test.ts): --session-key is forwarded in the JSON-RPC payload; omitted / whitespace-only values do not appear in the payload.
  • Wake handler test (src/gateway/server-methods/cron.validation.test.ts): forwards sessionKey to context.cron.wake when present, omits when absent, schema rejects empty / non-string values.
  • Timer wake test (src/cron/service/wake.test.ts): threads sessionKey to both enqueueSystemEvent and requestHeartbeat on both now and next-heartbeat modes (collapses to targeted-immediate when sessionKey is supplied so the runner doesn't drop the wake as not-due); whitespace-only keys treated as omitted.
  • Cron-adapter integration test (src/gateway/server-cron.test.ts, commit cbded2f68f): with multi-agent config (default primary + non-default ops) and a foreign-agent session key, both enqueueSystemEvent and requestHeartbeat adapter call sites resolve agentId to ops symmetrically. Pre-PR this assert would fail on the enqueue side (rerouted to primary's main).
  • Live gateway smoke from the rebased branch (multi-agent config, ollama-cloud/kimi-k2.5): --session-key flag advertised in CLI help, system event --session-key accepted by the wake RPC ({"ok": true}); foreign-agent call lands at session:agent:ops:… lane with sessionFile under agents/ops/sessions/…, default-agent call at session:agent:primary:… lane with sessionFile under agents/primary/sessions/…; both agent runs completed against the live provider.
  • Real-behavior reproducer above documents observed routing for the three key shapes (agent-prefixed channel, agent-prefixed cron-run, relative).
  • Existing wake({mode, text}) callers (no sessionKey) continue to default to agent-main session — verified by the omitted-when-absent tests above.
  • pnpm protocol:check exit 0 locally on the rebased branch — Swift model regen matches the gen output.
  • Codex review on the rebased branch addressed: [P3] timing-exception now documented in docs/cli/system.md; the proof block above uses only helpers actually exported from src/routing/session-key.ts.

Security Impact

  • New permissions/capabilities? No.
  • Secrets/tokens handling changed? No.
  • New/changed network calls? No.
  • Command/tool execution surface changed? No.
  • Data access scope changed? Slightly — a wake caller can now target a specific session within an agent. The cross-agent safety check in resolveCronSessionKey is preserved: keys that don't belong to the resolving agent fall back to that agent's main session, so a caller cannot inject events into a different agent's queue.

@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation app: web-ui App: web-ui gateway Gateway runtime cli CLI command changes size: M labels May 7, 2026
@openclaw-barnacle openclaw-barnacle Bot added the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 7, 2026
@clawsweeper

clawsweeper Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs changes before merge.

Summary
The branch adds an optional wake sessionKey schema/Swift/CLI surface, threads it through CronService and gateway cron routing, updates docs/changelog, and adds wake/cron/CLI regression tests plus test type cleanups.

Reproducibility: yes. Source inspection on current main shows WakeParamsSchema has only mode/text and openclaw system event calls wake with only { mode, text }; the PR body also supplies after-fix live gateway proof.

Real behavior proof
Sufficient (logs): The PR body contains after-fix real gateway logs and command output showing targeted session-key routing for default and non-default agents.

Next step before merge
One narrow docs/contract alignment issue is concrete enough for an automated repair if maintainers want ClawSweeper to help; otherwise this remains normal contributor PR review.

Security
Cleared: The diff adds an authenticated wake parameter and generated/protocol model updates without new dependencies, workflows, secrets handling, or external code execution paths.

Review findings

  • [P3] Align the session-key fallback docs with routing — docs/cli/system.md:53-55
Review details

Best possible solution:

Land the focused external wake/sessionKey surface after aligning the fallback wording or behavior and coordinating with the companion internal cron-run routing PR.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection on current main shows WakeParamsSchema has only mode/text and openclaw system event calls wake with only { mode, text }; the PR body also supplies after-fix live gateway proof.

Is this the best way to solve the issue?

Mostly yes. The additive protocol/CLI field and existing CronService/gateway adapter threading are the narrow maintainable path, but the docs or unknown-agent fallback behavior should be made consistent before merge.

Full review comments:

  • [P3] Align the session-key fallback docs with routing — docs/cli/system.md:53-55
    The new docs say keys that do not belong to the resolved agent fall back to that agent's main session, but the new adapter strips an unknown agent:<id>: prefix and preserves the suffix under the configured default agent. For example, an unknown agent:ghost:slack:... key routes like a default-agent Slack session, not main. Please either update this safety wording or change the adapter/tests to keep the documented main-session fallback.
    Confidence: 0.87

Overall correctness: patch is correct
Overall confidence: 0.84

Acceptance criteria:

  • pnpm test src/gateway/server-cron.test.ts src/cron/service/wake.test.ts src/gateway/server-methods/cron.validation.test.ts src/cli/system-cli.test.ts
  • pnpm exec oxfmt --check --threads=1 docs/cli/system.md src/gateway/server-cron.ts src/gateway/server-cron.test.ts
  • git diff --check

What I checked:

  • Current main lacks typed wake sessionKey: Current WakeParamsSchema defines mode and text only, with extra properties left opaque via additionalProperties; sessionKey is not a typed protocol field on current main. (src/gateway/protocol/schema/agent.ts:217, e432c3270114)
  • Current main CLI is main-session-only: Current openclaw system event exposes --text, --mode, and --json, then calls wake with only { mode, text }, so it cannot send a session key. (src/cli/system-cli.ts:56, e432c3270114)
  • PR adds the CLI/protocol contract surface: The PR head adds --session-key, forwards it in the CLI wake payload when non-empty, and adds sessionKey: Type.Optional(NonEmptyString) to WakeParamsSchema. (src/cli/system-cli.ts:61, fa93db7d8793)
  • PR threads targeted wake behavior: The PR head trims the optional session key, rejects subagent session targets, enqueues the event with the key, and uses a targeted immediate heartbeat for both now and next-heartbeat when a session key is supplied. (src/cron/service/timer.ts:1790, fa93db7d8793)
  • PR aligns enqueue and wake target derivation: The PR head derives the cron agent target from agent-prefixed session keys for both resolveCronWakeTarget and enqueueSystemEvent, while preserving untargeted fanout when no target is supplied. (src/gateway/server-cron.ts:180, fa93db7d8793)
  • Docs fallback mismatch: The docs say keys outside the resolved agent fall back to the agent's main session, but the new adapter code and test preserve the suffix under the configured default agent for unknown agent-prefixed keys. Public docs: docs/cli/system.md. (docs/cli/system.md:53, fa93db7d8793)

Likely related people:

  • steipete: Recent commit history shows repeated work on src/gateway/server-cron.ts, src/cron/service/timer.ts, wake scheduling intent, and gateway/protocol-adjacent behavior. (role: recent area contributor; confidence: high; commits: 5b3e2497bd55, 5a80be35e982, c06739d773da; files: src/gateway/server-cron.ts, src/cron/service/timer.ts, src/gateway/protocol/schema/agent.ts)
  • kevinslin: Recent merged cron work touched diagnostics, stale next-run repair, and cron runtime behavior in the same service area this PR changes. (role: recent cron contributor; confidence: medium; commits: 89db1e5440f5, 5b9672b4bbfb, 7175b1b5c634; files: src/gateway/server-cron.ts, src/cron/service/timer.ts)
  • amknight: Recent cron hook and plugin SDK work touched src/gateway/server-cron.ts and the public cron/plugin boundary adjacent to this PR's adapter changes. (role: adjacent cron/plugin SDK contributor; confidence: medium; commits: cd24da031b96, f155a5f95593; files: src/gateway/server-cron.ts, src/plugins/hook-types.ts)

Remaining risk / open question:

  • The new docs describe foreign session keys as falling back to main, while the PR code/test preserves the requested suffix under the default agent for unknown agent-prefixed keys.
  • This touches broad cron/wake routing, so maintainer CI and companion coordination with fix(heartbeat): remap cron-run exec events to session keys #80214 should gate merge.
  • I did not run local tests because this review was read-only; validation is from source/diff inspection and the contributor's supplied proof.

Codex review notes: model gpt-5.5, reasoning high; reviewed against e432c3270114.

@Kaspre Kaspre force-pushed the fix/wake-protocol-session-key branch from 6454d57 to e539325 Compare May 7, 2026 04:23
@openclaw-barnacle openclaw-barnacle Bot added app: macos App: macos proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 7, 2026
@Kaspre Kaspre force-pushed the fix/wake-protocol-session-key branch from 25db9de to 02e4561 Compare May 7, 2026 16:36
@openclaw-barnacle openclaw-barnacle Bot removed the app: macos App: macos label May 7, 2026
Kaspre added a commit to Kaspre/openclaw that referenced this pull request May 7, 2026
Codex review on PR openclaw#78687 [P3] flagged that the docs say next-heartbeat
"waits for the next scheduled tick" while the patched timer collapses
next-heartbeat+sessionKey to an immediate targeted wake. Add a callout
describing the exception and pointing callers who want delayed delivery
back at the no-session-key path.

Refs openclaw#78687.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Kaspre

Kaspre commented May 7, 2026

Copy link
Copy Markdown
Contributor Author

2026-05-07 — addressed clawsweeper review feedback

Both findings from the prior review are addressed:

  • [P3] Document the session-key timing exception — added a callout in docs/cli/system.md explaining that next-heartbeat + --session-key collapses to an immediate targeted wake (because the runner drops event-intent wakes as not-due) and pointing callers who want delayed delivery back at the no-session-key path. Commit 108b66c560.
  • Proof script mismatch — rewrote the Real-behavior-proof block to use only helpers actually exported from src/routing/session-key.ts (scopedHeartbeatWakeOptions, resolveAgentIdFromSessionKey, classifySessionKeyShape). The rewritten block also corrects an over-claim in the prior version: it shows that the routing helper alone passes agent-prefixed keys through but drops the sessionKey for relative keys — and explicitly explains how this PR's cron-adapter change closes the asymmetry with enqueueSystemEvent going through resolveCronSessionKey. PR body updated.

Locally: pnpm protocol:check exits 0 on the rebased branch (Swift regen matches gen output), and pnpm exec oxlint --tsconfig config/tsconfig/oxlint.core.json src/cli/system-cli.test.ts is clean after the lint-fix commit 02e456116a.

CI is currently re-running the broad shards on the rebased branch.

Kaspre added a commit to Kaspre/openclaw that referenced this pull request May 7, 2026
…e and wake

When `cron.wake` is called with only an agent-prefixed `sessionKey` (no
explicit `agentId`), the gateway cron adapter must derive the same agentId
on both `enqueueSystemEvent` and `requestHeartbeat` so events land in (and
heartbeats fire on) the same agent target. Pre-PR, only `requestHeartbeat`
derived agentId from the key; `enqueueSystemEvent` ran through
`resolveCronSessionKey` with the configured-default agent and was rerouted
to that agent's main session under multi-agent deployments where `main`
exists but is not the default.

The new test exercises the cron-adapter directly via `state.cron.state.deps`
with a multi-agent config (`primary` default + `ops` non-default) and a
`agent:ops:cron:nightly:run:abc-123` foreign-agent session key, asserting
that both call sites resolve the agent target to "ops" rather than falling
back to "primary".

Refs openclaw#78687.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Kaspre

Kaspre commented May 7, 2026

Copy link
Copy Markdown
Contributor Author

2026-05-07 — added cron-adapter integration test + live gateway smoke

Followed up on Bossman's request to add stronger evidence than the helper-script proof:

  1. Cron-adapter integration test (commit cbded2f68f): added to src/gateway/server-cron.test.ts. Constructs the cron adapter against a multi-agent config (primary default + ops non-default) and asserts that enqueueSystemEvent and requestHeartbeat BOTH derive agentId="ops" (not primary) when given a foreign-agent session key — the exact symmetry property this PR's server-cron.ts change establishes. Locally pnpm vitest run src/gateway/server-cron.test.ts -t "symmetric" is green (1 passed | 12 skipped, 26ms test time). Sensitivity check: pre-PR the assert on the enqueue side would fail because resolveCronAgent(undefined) rerouted to the configured default agent.

  2. Live gateway smoke from the rebased branch: built the gateway via node scripts/run-node.mjs gateway against OPENCLAW_HOME=/tmp/oc-test-home on port 18790 with a two-agent config. Confirmed (a) --session-key flag is advertised in openclaw system event --help, (b) the wake RPC accepts the typed sessionKey and returns {"ok": true}, (c) system event --session-key agent:primary:cron:demo:run:case_b2 --mode now produces command-queue lane session:agent:primary:cron:demo:run:case_b2 on the test gateway (consistent with the cron-adapter routing this PR fixes).

The corresponding foreign-agent live observation didn't surface — heartbeat-runner is gated on per-agent model/heartbeat config, and the test config's ops agent has neither — but the integration test asserts the cron-adapter routing decision directly at the call sites, closing that observability gap.

Body updated; "What was not tested" now reflects the dispatch-layer gating limitation honestly rather than the prior blanket "live E2E not run" caveat.

Re-review progress:

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 8, 2026
@Kaspre

Kaspre commented May 8, 2026

Copy link
Copy Markdown
Contributor Author

2026-05-07 — completed live-gateway proof of foreign-agent routing

Per Bossman's call to add a real model + heartbeat config to the test setup so the foreign-agent dispatch lane surfaces.

Root cause of the prior gap: isHeartbeatEnabledForAgent (src/infra/heartbeat-summary.ts:32) returns true for a non-default agent only when (a) some agent has explicit heartbeat config, OR (b) agents.defaults.heartbeat is set, OR (c) the agent IS the default. My earlier minimal config had none of those, so heartbeat-runner short-circuited for ops and the lane never fired.

Fixed by adding agents.defaults.heartbeat: { every: "1h" } to the test config. Restarted the patched gateway and re-ran two system event calls back-to-back. Captured gateway log:

20:18:49 [diagnostic]  lane enqueue: lane=session:agent:ops:cron:demo:run:case_a3 queueSize=1
20:19:07 [diagnostic]  lane enqueue: lane=session:agent:primary:cron:demo:run:case_b3 queueSize=1

The lane name is the routing-decision proof. Foreign-agent call (CASE_A3) routed to ops's lane; default-agent call (CASE_B3) routed to primary's lane. Pre-PR, CASE_A3 would have routed to primary's lane (default-agent fallback on the enqueue side); the symmetric agentId derivation this PR adds is what makes both calls land where the session key says.

PR body updated (Live-gateway smoke section + Test plan checklist) with the captured logs. The routing-decision signal is the lane name, captured upstream of model invocation.

Re-review progress:

@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 8, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 8, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 8, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 8, 2026
@Kaspre Kaspre force-pushed the fix/wake-protocol-session-key branch from cbded2f to fdb5530 Compare May 10, 2026 04:21
Kaspre added a commit to Kaspre/openclaw that referenced this pull request May 10, 2026
Codex review on PR openclaw#78687 [P3] flagged that the docs say next-heartbeat
"waits for the next scheduled tick" while the patched timer collapses
next-heartbeat+sessionKey to an immediate targeted wake. Add a callout
describing the exception and pointing callers who want delayed delivery
back at the no-session-key path.

Refs openclaw#78687.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Kaspre added a commit to Kaspre/openclaw that referenced this pull request May 10, 2026
…e and wake

When `cron.wake` is called with only an agent-prefixed `sessionKey` (no
explicit `agentId`), the gateway cron adapter must derive the same agentId
on both `enqueueSystemEvent` and `requestHeartbeat` so events land in (and
heartbeats fire on) the same agent target. Pre-PR, only `requestHeartbeat`
derived agentId from the key; `enqueueSystemEvent` ran through
`resolveCronSessionKey` with the configured-default agent and was rerouted
to that agent's main session under multi-agent deployments where `main`
exists but is not the default.

The new test exercises the cron-adapter directly via `state.cron.state.deps`
with a multi-agent config (`primary` default + `ops` non-default) and a
`agent:ops:cron:nightly:run:abc-123` foreign-agent session key, asserting
that both call sites resolve the agent target to "ops" rather than falling
back to "primary".

Refs openclaw#78687.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 10, 2026

Kaspre commented May 10, 2026

Copy link
Copy Markdown
Contributor Author

Updated this PR for current upstream/main and pushed fdb55303c8.

Changes since the earlier revision:

  • Rebased onto latest main.
  • Preserved untargeted cron wake requests so heartbeat fanout/broadcast behavior remains unchanged when neither agentId nor sessionKey is provided.
  • Kept default-agent resolution only for relative, non-empty session keys.
  • Added a multi-agent regression test covering untargeted heartbeat fanout.

Validation:

  • pnpm test src/gateway/server-cron.test.ts src/cron/service/wake.test.ts src/gateway/server-methods/cron.validation.test.ts src/cli/system-cli.test.ts passed.
  • pnpm exec oxfmt --check --threads=1 src/gateway/server-cron.ts src/gateway/server-cron.test.ts src/cron/service/timer.ts src/cron/service/wake.test.ts src/gateway/server-methods/cron.validation.test.ts src/cli/system-cli.ts src/cli/system-cli.test.ts docs/cli/system.md CHANGELOG.md passed.
  • pnpm exec oxlint --tsconfig config/tsconfig/oxlint.core.json src/gateway/server-cron.ts src/gateway/server-cron.test.ts src/cron/service/timer.ts src/cron/service/wake.test.ts src/gateway/server-methods/cron.validation.test.ts src/cli/system-cli.ts src/cli/system-cli.test.ts passed.
  • pnpm protocol:check passed.

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 10, 2026
Kaspre and others added 9 commits May 11, 2026 12:09
Adds an optional sessionKey to the WakeParamsSchema and threads it through
the gateway wake handler, CronService.wake(), and the underlying timer.wake()
ops so callers can target a specific session for async-task completion
relays instead of always hitting the agent's main session.

Also adds --session-key to `openclaw system event`.

The schema rejects empty/non-string sessionKey at the gateway boundary;
mismatched session keys (a key that does not belong to the resolving agent)
fall back to the agent's main session inside resolveCronSessionKey, which
is the existing safety path.

Refs openclaw#52305 (companion to PR openclaw#50818, which closes the related cron-run
remap slice at internal enqueue sites). Doesn't depend on openclaw#50818.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… in cron adapter

Address review findings from successive codex rounds:

1. next-heartbeat + sessionKey now fires a targeted immediate wake.
   The regularly-scheduled heartbeat fires for the agent's main session,
   not the supplied sessionKey, so an event queued for a non-main session
   would sit stranded indefinitely; an "event"-intent wake is also
   deferred as not-due by the heartbeat runner and not retried, so
   neither path delivers without an explicit immediate wake.

2. resolveCronWakeTarget now always runs through resolveCronAgent, both
   for agent-prefixed session keys (so non-default agents are honored)
   and relative keys (so the configured default agent is used instead
   of the hardcoded "main" returned by resolveAgentIdFromSessionKey).
   Mirrors the matching fix in the enqueueSystemEvent adapter so wake
   and enqueue resolve to the same target.

3. Generated Swift `WakeParams` models now expose the new optional
   `sessionkey` field (codingKey "sessionKey") in both the macOS and
   shared OpenClawKit copies. Locally regenerated from agent.ts via
   protocol:gen + protocol:gen:swift would have produced this; the
   environment couldn't run the generators (fs-safe transitive
   typecheck errors), so the diff was applied by hand to match what
   pnpm protocol:check would output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Caught by oxlint typescript-eslint(no-unnecessary-type-assertion) in CI.
mock.calls is typed as any[][], so the trailing `!` adds nothing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex review on PR openclaw#78687 [P3] flagged that the docs say next-heartbeat
"waits for the next scheduled tick" while the patched timer collapses
next-heartbeat+sessionKey to an immediate targeted wake. Add a callout
describing the exception and pointing callers who want delayed delivery
back at the no-session-key path.

Refs openclaw#78687.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e and wake

When `cron.wake` is called with only an agent-prefixed `sessionKey` (no
explicit `agentId`), the gateway cron adapter must derive the same agentId
on both `enqueueSystemEvent` and `requestHeartbeat` so events land in (and
heartbeats fire on) the same agent target. Pre-PR, only `requestHeartbeat`
derived agentId from the key; `enqueueSystemEvent` ran through
`resolveCronSessionKey` with the configured-default agent and was rerouted
to that agent's main session under multi-agent deployments where `main`
exists but is not the default.

The new test exercises the cron-adapter directly via `state.cron.state.deps`
with a multi-agent config (`primary` default + `ops` non-default) and a
`agent:ops:cron:nightly:run:abc-123` foreign-agent session key, asserting
that both call sites resolve the agent target to "ops" rather than falling
back to "primary".

Refs openclaw#78687.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Kaspre Kaspre force-pushed the fix/wake-protocol-session-key branch from fa93db7 to 8a5065c Compare May 11, 2026 16:11
@openclaw-barnacle openclaw-barnacle Bot removed channel: msteams Channel integration: msteams channel: slack Channel integration: slack proof: sufficient ClawSweeper judged the real behavior proof convincing. labels May 11, 2026
@steipete steipete merged commit 15fa1e5 into openclaw:main May 11, 2026
85 of 87 checks passed
steipete pushed a commit that referenced this pull request May 11, 2026
Codex review on PR #78687 [P3] flagged that the docs say next-heartbeat
"waits for the next scheduled tick" while the patched timer collapses
next-heartbeat+sessionKey to an immediate targeted wake. Add a callout
describing the exception and pointing callers who want delayed delivery
back at the no-session-key path.

Refs #78687.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
steipete pushed a commit that referenced this pull request May 11, 2026
…e and wake

When `cron.wake` is called with only an agent-prefixed `sessionKey` (no
explicit `agentId`), the gateway cron adapter must derive the same agentId
on both `enqueueSystemEvent` and `requestHeartbeat` so events land in (and
heartbeats fire on) the same agent target. Pre-PR, only `requestHeartbeat`
derived agentId from the key; `enqueueSystemEvent` ran through
`resolveCronSessionKey` with the configured-default agent and was rerouted
to that agent's main session under multi-agent deployments where `main`
exists but is not the default.

The new test exercises the cron-adapter directly via `state.cron.state.deps`
with a multi-agent config (`primary` default + `ops` non-default) and a
`agent:ops:cron:nightly:run:abc-123` foreign-agent session key, asserting
that both call sites resolve the agent target to "ops" rather than falling
back to "primary".

Refs #78687.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@steipete

Copy link
Copy Markdown
Contributor

Landed via rebase onto main.

Proof used before merge:

  • pnpm test src/gateway/server-cron.test.ts src/cron/service/wake.test.ts src/gateway/server-methods/cron.validation.test.ts src/cli/system-cli.test.ts
  • pnpm exec oxlint --tsconfig config/tsconfig/oxlint.core.json src/gateway/server-cron.ts src/gateway/server-cron.test.ts src/cron/service/timer.ts src/cron/service/wake.test.ts src/gateway/server-methods/cron.ts src/gateway/server-methods/cron.validation.test.ts src/cli/system-cli.ts src/cli/system-cli.test.ts
  • pnpm exec oxfmt --check --threads=1 src/gateway/server-cron.ts src/gateway/server-cron.test.ts docs/cli/system.md
  • git diff --check

Maintainer fixup source SHA: 4d75876
Landed commit: 15fa1e5

Thanks @Kaspre!

steipete pushed a commit that referenced this pull request May 12, 2026
Codex review on PR #78687 [P3] flagged that the docs say next-heartbeat
"waits for the next scheduled tick" while the patched timer collapses
next-heartbeat+sessionKey to an immediate targeted wake. Add a callout
describing the exception and pointing callers who want delayed delivery
back at the no-session-key path.

Refs #78687.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
steipete pushed a commit that referenced this pull request May 12, 2026
…e and wake

When `cron.wake` is called with only an agent-prefixed `sessionKey` (no
explicit `agentId`), the gateway cron adapter must derive the same agentId
on both `enqueueSystemEvent` and `requestHeartbeat` so events land in (and
heartbeats fire on) the same agent target. Pre-PR, only `requestHeartbeat`
derived agentId from the key; `enqueueSystemEvent` ran through
`resolveCronSessionKey` with the configured-default agent and was rerouted
to that agent's main session under multi-agent deployments where `main`
exists but is not the default.

The new test exercises the cron-adapter directly via `state.cron.state.deps`
with a multi-agent config (`primary` default + `ops` non-default) and a
`agent:ops:cron:nightly:run:abc-123` foreign-agent session key, asserting
that both call sites resolve the agent target to "ops" rather than falling
back to "primary".

Refs #78687.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Kaspre Kaspre deleted the fix/wake-protocol-session-key branch May 15, 2026 12:51
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
Codex review on PR openclaw#78687 [P3] flagged that the docs say next-heartbeat
"waits for the next scheduled tick" while the patched timer collapses
next-heartbeat+sessionKey to an immediate targeted wake. Add a callout
describing the exception and pointing callers who want delayed delivery
back at the no-session-key path.

Refs openclaw#78687.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
…e and wake

When `cron.wake` is called with only an agent-prefixed `sessionKey` (no
explicit `agentId`), the gateway cron adapter must derive the same agentId
on both `enqueueSystemEvent` and `requestHeartbeat` so events land in (and
heartbeats fire on) the same agent target. Pre-PR, only `requestHeartbeat`
derived agentId from the key; `enqueueSystemEvent` ran through
`resolveCronSessionKey` with the configured-default agent and was rerouted
to that agent's main session under multi-agent deployments where `main`
exists but is not the default.

The new test exercises the cron-adapter directly via `state.cron.state.deps`
with a multi-agent config (`primary` default + `ops` non-default) and a
`agent:ops:cron:nightly:run:abc-123` foreign-agent session key, asserting
that both call sites resolve the agent target to "ops" rather than falling
back to "primary".

Refs openclaw#78687.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
Codex review on PR openclaw#78687 [P3] flagged that the docs say next-heartbeat
"waits for the next scheduled tick" while the patched timer collapses
next-heartbeat+sessionKey to an immediate targeted wake. Add a callout
describing the exception and pointing callers who want delayed delivery
back at the no-session-key path.

Refs openclaw#78687.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
…e and wake

When `cron.wake` is called with only an agent-prefixed `sessionKey` (no
explicit `agentId`), the gateway cron adapter must derive the same agentId
on both `enqueueSystemEvent` and `requestHeartbeat` so events land in (and
heartbeats fire on) the same agent target. Pre-PR, only `requestHeartbeat`
derived agentId from the key; `enqueueSystemEvent` ran through
`resolveCronSessionKey` with the configured-default agent and was rerouted
to that agent's main session under multi-agent deployments where `main`
exists but is not the default.

The new test exercises the cron-adapter directly via `state.cron.state.deps`
with a multi-agent config (`primary` default + `ops` non-default) and a
`agent:ops:cron:nightly:run:abc-123` foreign-agent session key, asserting
that both call sites resolve the agent target to "ops" rather than falling
back to "primary".

Refs openclaw#78687.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
Codex review on PR openclaw#78687 [P3] flagged that the docs say next-heartbeat
"waits for the next scheduled tick" while the patched timer collapses
next-heartbeat+sessionKey to an immediate targeted wake. Add a callout
describing the exception and pointing callers who want delayed delivery
back at the no-session-key path.

Refs openclaw#78687.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
…e and wake

When `cron.wake` is called with only an agent-prefixed `sessionKey` (no
explicit `agentId`), the gateway cron adapter must derive the same agentId
on both `enqueueSystemEvent` and `requestHeartbeat` so events land in (and
heartbeats fire on) the same agent target. Pre-PR, only `requestHeartbeat`
derived agentId from the key; `enqueueSystemEvent` ran through
`resolveCronSessionKey` with the configured-default agent and was rerouted
to that agent's main session under multi-agent deployments where `main`
exists but is not the default.

The new test exercises the cron-adapter directly via `state.cron.state.deps`
with a multi-agent config (`primary` default + `ops` non-default) and a
`agent:ops:cron:nightly:run:abc-123` foreign-agent session key, asserting
that both call sites resolve the agent target to "ops" rather than falling
back to "primary".

Refs openclaw#78687.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

app: web-ui App: web-ui channel: zalouser Channel integration: zalouser cli CLI command changes docs Improvements or additions to documentation gateway Gateway runtime proof: supplied External PR includes structured after-fix real behavior proof. size: L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: async task completion reports can be lost because system event/wake is not reliably session-targeted

3 participants