fix(cron): capture originating session/agent on the cron wake tool call#83738
Conversation
|
Codex review: needs maintainer review before merge. Reviewed June 10, 2026, 3:49 PM ET / 19:49 UTC. Summary PR surface: Source +187, Tests +529, Other +146. Total +862 across 24 files. Reproducibility: yes. The inspected real Telegram before recording reproduces the missing wake reply on current main, and the after recordings show a freshly executed wake turn returning to the same topic. Review metrics: 3 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Keep the origin-aware wake routing and stored delivery-context design, but merge only after maintainers explicitly approve the shared origin-first behavior for absent heartbeat targets and the stricter contradictory-identity RPC contract. Do we have a high-confidence way to reproduce the issue? Yes. The inspected real Telegram before recording reproduces the missing wake reply on current main, and the after recordings show a freshly executed wake turn returning to the same topic. Is this the best way to solve the issue? Yes for the core bug: capturing the origin at the cron tool boundary and resolving transport context from the session store is the narrowest maintainable design. Maintainer judgment is still required for the shared absent-target fallback and stricter direct-RPC validation policy. AGENTS.md: found and applied where relevant. Codex review notes: reasoning high; reviewed against e6b0a22f36cd. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +187, Tests +529, Other +146. Total +862 across 24 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
fbca52b to
7d13307
Compare
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
…outing Drives the patched gateway wake handler (validateWakeParams + isSubagentSessionKey guard) through cron.wake() / timer.ts with logging deps. Captures stdout demonstrating that a wake fired from a non-main session targets the originating sessionKey + agentId on both enqueueSystemEvent and the immediate-heartbeat nudge, instead of falling through to the heartbeat/main default. Five scenarios: 1. non-main session + non-default agent (the bug-fix case) 2. no origin -> backwards-compatible default routing 3. next-heartbeat + sessionKey -> targeted-immediate collapse 4. subagent sessionKey rejected by gateway handler guard 5. whitespace-only origin -> defence-in-depth default fallthrough All ids in the script are synthetic — no real chat identifiers are referenced. Run with: pnpm exec tsx scripts/proof-cron-wake-origin.mts
martingarramon
left a comment
There was a problem hiding this comment.
The threading is complete and consistent. The wake case at cron-tool.ts:837 uses resolveMainSessionAlias + resolveInternalSessionKey with the same resolution pattern already present at :464 (add) and :674 (list). Explicit params on the tool call (sessionKey, agentId) take precedence over inferred values from opts.agentSessionKey — that's the right priority.
The backward-compat guard in timer.ts is important and correct: when neither sessionKey nor agentId is set, the function passes no opts at all, preserving the pre-fix default-session binding in enqueueSystemEvent. This means existing wakes that don't supply session context continue to work without change.
Pipeline trace confirmed on origin/main:
cron-tool.tsresolves and passes{ sessionKey, agentId }to the gatewaygateway/server-methods/cron.tsthreads both through tocron.wakeservice-contract.tsWakeParamsSchemaaccepts both as optionalservice/ops.tswakeNowacceptsagentId(was missing before this PR)service/timer.tsforwards toenqueueSystemEventwith the backward-compat guard
bd6705f to
fdf835d
Compare
…outing Drives the patched gateway wake handler (validateWakeParams + isSubagentSessionKey guard) through cron.wake() / timer.ts with logging deps. Captures stdout demonstrating that a wake fired from a non-main session targets the originating sessionKey + agentId on both enqueueSystemEvent and the immediate-heartbeat nudge, instead of falling through to the heartbeat/main default. Five scenarios: 1. non-main session + non-default agent (the bug-fix case) 2. no origin -> backwards-compatible default routing 3. next-heartbeat + sessionKey -> targeted-immediate collapse 4. subagent sessionKey rejected by gateway handler guard 5. whitespace-only origin -> defence-in-depth default fallthrough All ids in the script are synthetic — no real chat identifiers are referenced. Run with: pnpm exec tsx scripts/proof-cron-wake-origin.mts
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. |
|
@clawsweeper re-review |
|
🦞👀 Command router queued. I will update this comment with the next step. Re-review progress:
|
|
@clawsweeper re-review Pushed |
|
🦞🧹 I asked ClawSweeper to review this item again. |
|
@clawsweeper re-review |
|
🦞👀 Command router queued. I will update this comment with the next step. Re-review progress:
|
|
@clawsweeper re-review Pushed Fix: when no heartbeat target is configured and the drained event carries a deliverable origin turn source, deliver to that origin. Explicit Re-verified in the Mantis-equivalent local harness (real gateway + Telegram bot + tdlib driver, wake driven via the same |
|
🦞🧹 I asked ClawSweeper to review this item again. |
|
@clawsweeper re-review |
|
🦞👀 Command router queued. I will update this comment with the next step. Re-review progress:
|
|
@clawsweeper re-review |
|
🦞👀 Command router queued. I will update this comment with the next step. Re-review progress:
|
|
@clawsweeper re-review |
|
🦞👀 Command router queued. I will update this comment with the next step. Re-review progress:
|
|
@clawsweeper re-review |
|
🦞👀 Command router queued. I will update this comment with the next step. Re-review progress:
|
|
@clawsweeper re-review Added the current-head proof you asked for (the [P1] proof-guidance item): the no-heartbeat-target fallback, captured on the exact PR head with no
This is in addition to the earlier |
|
🦞👀 Command router queued. I will update this comment with the next step. Re-review progress:
|
|
@clawsweeper re-review |
|
🦞👀 Command router queued. I will update this comment with the next step. Re-review progress:
|
The cron wake tool forwarded only {mode, text} to the gateway, so every
wake enqueued against the heartbeat/main default. Wakes scheduled from a
non-main session (Telegram topic, Discord thread, multi-agent setup)
silently routed to the wrong conversation lane.
- cron-tool wake action resolves the calling agent's sessionKey/agentId
(mirrors the add action); explicit tool params take precedence, and an
explicit cross-agent sessionKey derives its agentId from the key itself
so the caller's agentId is never paired with a foreign session.
- WakeParamsSchema, gateway wake handler, cron service contract and
wake() thread the optional sessionKey/agentId through to
enqueueSystemEvent and the heartbeat request.
- wake() also carries the originating session's stored channel delivery
context (e.g. the bound Telegram message_thread_id) onto the system
event via an optional resolveOriginDeliveryContext dep implemented in
server-cron from the session store - a wake-now heartbeat replies in
the originating topic, not the chat root. No-origin wakes keep the
exact pre-fix enqueueSystemEvent(text, undefined) shape.
- Swift gateway protocol client regenerated for the new WakeParams
field.
Scheduled main-session cron jobs are untouched: current main already
routes them through the dedicated cron run lane with borrowed delivery
context, and this change layers on top of that design.
Closes the tool-path half of openclaw#46886 / openclaw#64556.
…outing Drives the real gateway wake handler into the real cron service wake() with logging deps, demonstrating origin routing, backwards-compatible default routing, the next-heartbeat targeted-immediate collapse, the subagent guard, and whitespace fall-through. All identifiers synthetic. Run: pnpm exec tsx scripts/proof-cron-wake-origin.mts
Focused Stryker mutation on the PR-touched code surfaced gaps the example-based tests missed: agentId-only wakes (enqueue opts + heartbeat threading), sessionKey-only wakes (origin delivery-context resolution), wake without text (required-param guard), context-less wake callers (optional chaining), and an explicit next-heartbeat mode. All pinned; wake.ts now scores 51/51 killed mutants, the cron-tool wake arm 26/31 (remaining survivors are test-harness equivalents).
The wake action's tool description gained the sessionKey/agentId override guidance, which flows into the codex dynamic-tools prompt fixtures.
An explicit agentId paired with an explicit sessionKey owned by a different agent is ambiguous: the gateway target resolver treats agentId as authoritative and would silently canonicalize the wake onto a session under that agent which the caller never named. Reject the contradictory pair instead of guessing a canonical owner; matching explicit pairs and single-field overrides are unchanged.
…ndary Direct gateway callers and generated clients bypass the cron tool, so mirror its guard in the wake handler: an explicit agentId that disagrees with the agent owning an agent-prefixed sessionKey is rejected as INVALID_REQUEST before reaching the cron service, instead of letting the target resolver silently canonicalize the wake onto an unnamed lane.
…et is configured The wake path carries the originating session's delivery context onto the system event, but resolveHeartbeatDeliveryTarget defaulted to target "none" whenever agents.defaults.heartbeat was absent — the woken turn ran and its reply was silently dropped. When no heartbeat target is configured and the drained event explicitly carried a deliverable origin turn source, resolve delivery to that origin; an explicit target "none" still suppresses, and runs without an origin turn source keep the previous default.
|
Land-ready proof for head Verification run:
Squash-merging now. |
Summary
wakeMCP tool forwards only{mode, text}to the gateway, so every wake enqueues against the heartbeat/main default. Wakes scheduled from a non-main session (Telegram topic, Discord thread, multi-agent setup) silently route to the wrong conversation lane — the event text never returns to the conversation that scheduled it. Closes the tool-path half of cron wake action does not support agentId — always routes to default agent #46886 / [Bug]: hooks.mappings[].agentId and sessionKey silently ignored for action="wake" #64556.cron-tool.tswake action resolves the calling agent'ssessionKey/agentIdvia the existingresolveInternalSessionKey/resolveSessionAgentIdpattern (mirrors theaddaction). Explicit tool params take precedence; an explicit cross-agentsessionKeyderives itsagentIdfrom the key itself (parseAgentSessionKey) so the caller's agentId is never paired with a foreign session.WakeParamsSchema(+ regenerated Swift protocol client), gateway wake handler, cron service contract, andwake()thread the optionalsessionKey/agentIdthrough toenqueueSystemEventand the heartbeat request.wake()carries the originating session's stored channel delivery context (e.g. the bound Telegrammessage_thread_id) onto the system event via an optionalresolveOriginDeliveryContextdep, implemented inserver-cron.tsfrom the session store (not by string-splitting the session key) — a wake-now heartbeat replies in the originating topic, not the chat root. No-origin wakes keep the exact pre-fixenqueueSystemEvent(text, undefined)shape.Rebase onto current main (this push)
main. Main has since absorbed the scheduled-job half of this design natively:sessionTarget:"main"cron jobs now run through the dedicatedagent:<id>:cron:<job>:run:<ts>lane and borrow the target session's delivery context (resolveMainSessionCronDeliveryContext). This PR therefore no longer touchesexecuteMainSessionCronJob/the run lane at all — it layers the same origin-capture idea onto the manualwakepath only, which main still lacks. (This resolves the earlier ClawSweeper [P1] "Preserve main cron run lanes" by construction.)wake()moved tosrc/cron/service/wake.tsupstream; the change follows it there.pnpm protocol:gen && pnpm protocol:gen:swift) for the newWakeParams.agentIdfield — resolves the earlier [P2].Real behavior proof
wakescheduled from a non-main Telegram forum-topic session must enqueue against the originating session, and the wake-triggered heartbeat turn must deliver its reply back into that same topic instead of the heartbeat/main default (tool-path half of cron wake action does not support agentId — always routes to default agent #46886 / [Bug]: hooks.mappings[].agentId and sessionKey silently ignored for action="wake" #64556).pnpm build:docker), real Telegram Bot API (dedicated test bot) and a real Telegram Desktop client driven by the tdlib user driver, running in a Linux container;claude-cli(claude-sonnet-4-6) agent backend; BEFORE (57633c42b, current main) and AFTER (this branch) builds captured with identical config includingagents.defaults.heartbeat: {"every":"1h","target":"last"}. This recreates the Mantis telegram-desktop-proof flow (telegram-desktop + user-driver + ffmpeg screen recording).outbound send ok … threadId=74 messageId=76); the wake fires an immediate heartbeat turn ON the topic session (trigger=heartbeat … historyPrompt=present); that second agent turn runsdate -uvia the exec tool and its reply — "PROOF-83738 wake turn ran at Wed Jun 10 04:43:21 UTC 2026" — is delivered into the SAME topic (threadId=74 messageId=77). Live command output in the reply demonstrates a real freshly-executed agent turn, not a delayed canned message.historyPrompt=none) and its reply is never delivered to the topic or the chat root.wake-origin-delivery.test.ts); live cross-agent wakes (the agentId-derivation contract is pinned by the cron-tool wake-routing unit tests).Current-head proof — no heartbeat target configured (the
a53fddb9f59fallback)Driven on the exact PR head with no
agents.defaults.heartbeatblock at all, via the samewakeRPC Mantis uses (system event --mode now --session-key <topic>). The wake-triggered turn runsdate -uand its reply lands back in the originating forum topic — proving the new no-heartbeat-target fallback:Reply captured live: "PROOF-83738 no-target wake ran at Wed Jun 10 13:14:33 UTC 2026" (full still · mp4). Gateway debug log (gw-notarget.log) shows the chain with no heartbeat target set:
target: "none"still suppresses (gw-tnone.log): withheartbeat.target: "none", the same wake fires the heartbeat turn (trigger=heartbeat,outBytes=12) but produces zerooutbound send— operator opt-out preserved.Proof detail — Mantis-flow recreation
This is a recreation of the Mantis telegram-desktop-proof flow using the patched local crabbox (same mechanism: real Telegram Desktop client + tdlib user driver + ffmpeg screen capture in a Linux container; both builds, same config incl.
agents.defaults.heartbeat: {"every":"1h","target":"last"}, bot + group ids are test-only). Scenario: a user in a forum topic asks the agent to schedulecron wake (mode "now"); the wake text instructs the heartbeat to reply in-thread.57633c42b)880d10a34)The wake starts a real second agent TURN, not a delayed canned reply — in this capture the wake text instructs the woken turn to run
date -uvia the exec tool, and the in-topic reply embeds the live command output ("PROOF-83738 wake turn ran at Wed Jun 10 04:43:21 UTC 2026"), which only a freshly-executed agent turn can produce:(full still · mp4 · gateway debug log)
Full-window stills: before / after · raw mp4s: before / after · redacted gateway logs: before / after
Key gateway-log lines (AFTER build, debug level):
Same scenario on BEFORE: the heartbeat fires with
historyPrompt=noneon a detached lane and no outbound send ever follows — the wake is lost.A deterministic harness driving the real gateway handler + cron service with logging deps is included (
pnpm exec tsx scripts/proof-cron-wake-origin.mts): origin routing, backwards-compatible default routing, the next-heartbeat targeted-immediate collapse, the subagent guard, and whitespace fall-through.Mutation testing (focused Stryker on the PR-touched code)
src/cron/service/wake.ts: 51/51 mutants killed (100%). Two initial survivors exposed real test gaps (agentId-only wakes; sessionKey-only delivery-context resolution) — both now pinned by dedicated tests.src/agents/tools/cron-tool.tswake arm (L840–893): 26/31 killed (83.9%). All 5 survivors are test-harness equivalents, not behavior gaps: the suitevi.mocksresolveSessionAgentIdto a constant (arg mutations invisible), themodefallback equals the compared literal, and{expectFinal:false}is invisible to the gateway mock. The killable gaps found on the way (wake-without-text guard, context-less callers, explicit next-heartbeat mode) got tests.Tests / CI
wake-origin.test.ts,wake-origin-delivery.test.ts, cron-tool wake-routing suite (10 cases),validateWakeParamsschema cases. All wake/cron/protocol suites green on Windows and Linux.extensions/bonjour/advertiser.test.tsflaking under parallel load (passes in isolation) andsrc/agents/compaction-planning-worker.test.ts(packaged-worker test, fails identically on unmodified upstream main in the same container — pre-existing environmental).src/gateway/server-cron.test.ts46/46 green.pnpm check(lint, typecheck prod),build:strict-smoke, protocol-gen mirror, plugins assets all green; the 11 extension test files that fail locally on Windows are pre-existing path-separator environment failures (all 191 tests in those files pass on Linux on this head, unchanged by this PR).