Skip to content

fix(gateway): stabilize inter-session completion wake prompts#63096

Closed
afurm wants to merge 4 commits into
openclaw:mainfrom
afurm:af/63030-completion-wake-context
Closed

fix(gateway): stabilize inter-session completion wake prompts#63096
afurm wants to merge 4 commits into
openclaw:mainfrom
afurm:af/63030-completion-wake-context

Conversation

@afurm

@afurm afurm commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

Summary

Describe the problem and fix in 2–5 bullets:

  • Problem: inter-session task_completion wakes sent through the gateway agent path did not rebuild the session-backed Inbound Context / Group Chat Context suffix that normal chat turns include.
  • Why it matters: the volatile system-prompt suffix changed on ACP/subagent completion notifications, which busted Anthropic prompt cache reuse and caused expensive cache rewrites.
  • What changed: src/gateway/server-methods/agent.ts now synthesizes persisted session context for inter-session completion wakes when no explicit extraSystemPrompt is provided, and preserves explicit caller-provided prompt context when it is present.
  • What did NOT change (scope boundary): no gateway protocol/schema changes, no changes to the normal inbound chat prompt path, and no changes to internal event formatting beyond using the existing session metadata to rebuild prompt context.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Root Cause (if applicable)

  • Root cause: the completion-wake path entered through gateway agent with inputProvenance.kind="inter_session" and task_completion internal events, but unlike the normal chat path it did not reconstruct the session-derived extraSystemPrompt sections that contain Inbound Context and Group Chat Context.
  • Missing detection / guardrail: there was no gateway-seam regression test asserting that inter-session completion wakes preserve the same session-derived prompt context as ordinary chat turns.
  • Contributing context (if known): these sections live below the cache boundary and above ## Runtime, so missing them changes the system prompt digest even when session state and transcript history are otherwise unchanged.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/gateway/server-methods/agent.test.ts
  • Scenario the test should lock in: an inter-session task_completion wake for an existing session should rebuild persisted inbound/group prompt context when extraSystemPrompt is omitted, and should preserve explicit extraSystemPrompt when one is provided.
  • Why this is the smallest reliable guardrail: the bug lives at the gateway agent ingress seam, where session metadata, provenance, and internal events are combined before the run is dispatched.
  • Existing test that already covers this (if any): does not create task rows for inter-session completion wakes covers adjacent routing behavior, but not prompt-context reconstruction.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

  • ACP/subagent completion notifications routed through the gateway now reuse the same persisted session context shaping as normal chat turns.
  • For affected Anthropic sessions, completion wakes should stop causing unnecessary prompt-cache invalidation from missing session-derived suffix sections.
  • No config, CLI, or protocol behavior changed.

Diagram (if applicable)

Before:
[child task completes]
  -> [gateway agent wake with inter_session provenance]
  -> [internal events only]
  -> [system prompt misses inbound/group context]
  -> [systemDigest changes] -> [prompt cache bust]

After:
[child task completes]
  -> [gateway agent wake with inter_session provenance]
  -> [rebuild persisted inbound/group context from session entry]
  -> [system prompt matches normal session shaping]
  -> [stable systemDigest] -> [prompt cache reused]

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime size: S labels Apr 8, 2026
@greptile-apps

greptile-apps Bot commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a prompt-cache stability regression in the gateway agent path: inter-session task_completion wakes were dispatched without the Inbound Context / Group Chat Context system-prompt suffix that normal chat turns include, causing the system-prompt digest to change on every completion notification and busting Anthropic prompt-cache reuse.

The fix adds buildPersistedCompletionWakeExtraSystemPrompt — a small pure helper that, for inter_session + task_completion calls with no explicit extraSystemPrompt, rebuilds those sections from the persisted SessionEntry using the same buildInboundMetaSystemPrompt / buildGroupChatContext helpers used by the normal chat path. Two new seam tests validate both the rebuild and the explicit-override paths.

Confidence Score: 5/5

Safe to merge — the fix is targeted, the logic is correct, and the two new seam tests lock in the intended behavior.

No P0 or P1 issues found. The new buildPersistedCompletionWakeExtraSystemPrompt helper correctly uses the spread-merge semantics of mergeSessionEntry (which preserves chatType, origin, subject from the stored entry), handles both the explicit-pass-through and rebuild branches, and guards the rebuild path behind both inter_session provenance and a task_completion event check. Tests cover both new code paths.

No files require special attention.

Vulnerabilities

No security concerns identified. The new helper reads only from already-persisted SessionEntry fields (channel identifiers, chat type, group subject) that are already included in normal chat system prompts; no new data surfaces are exposed. The explicit extraSystemPrompt pass-through correctly uses normalizeOptionalString before returning, consistent with the rest of the codebase.

Reviews (1): Last reviewed commit: "Gateway: stabilize completion wake promp..." | Re-trigger Greptile

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b9be0e8bc6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gateway/server-methods/agent.ts Outdated
@afurm afurm force-pushed the af/63030-completion-wake-context branch from b9be0e8 to 7343d9c Compare April 8, 2026 13:08

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7343d9c09e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gateway/server-methods/agent.ts Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 993ab77153

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gateway/server-methods/agent.ts
@clawsweeper

clawsweeper Bot commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

Thanks for the context here. I swept through the related work, and this is now duplicate or superseded.

Keep open: current main does not implement the completion-wake prompt reconstruction and the linked cache-stability issue remains open, but this PR is not merge-ready because it is dirty against main, only partially mirrors the current prompt contract, and lacks real behavior proof.

Canonical path: Close this stale PR. The latest review rated it F, the branch still lacks merge-ready proof, and there has been no human follow-up after the durable review.

So I’m closing this here because the remaining work is already tracked in the canonical issue.

Review details

Best possible solution:

Close this stale PR. The latest review rated it F, the branch still lacks merge-ready proof, and there has been no human follow-up after the durable review.

Do we have a high-confidence way to reproduce the issue?

Yes at source level: current gateway completion wakes can enter the agent path with inter-session task-completion events while normal turns assemble richer session-derived prompt context. I did not run a live Anthropic cache trace in this read-only review.

Is this the best way to solve the issue?

No, not as currently patched. The maintainable fix should reuse or exactly mirror current normal-turn prompt assembly instead of adding a partial parallel reconstruction, then prove provider-visible cache stability.

Security review:

Security review cleared: The diff is limited to gateway TypeScript prompt construction and tests, with no dependency, workflow, secret, package-resolution, install, or code-download changes.

AGENTS.md: found and applied where relevant.

What I checked:

  • stale F-rated PR: PR was opened 2026-04-08T10:27:25Z, is older than 30 days, and the latest review rated it F.
  • proof blocker: real behavior proof is missing and proof tier is F, so this branch is not merge-ready without contributor follow-up.
  • no human follow-up: live comments and timeline hydrated by apply contain no non-automation activity after the ClawSweeper review.

Likely related people:

  • steipete: Local blame and -S history point to Peter Steinberger on current gateway dispatch, prompt assembly, inbound metadata, and subagent announce delivery paths relevant to this PR. (role: recent area contributor; confidence: medium; commits: 506c2ee18186, 53273b490b3a, b75be0914491; files: src/gateway/server-methods/agent.ts, src/auto-reply/reply/get-reply-run.ts, src/auto-reply/reply/inbound-meta.ts)
  • vincentkoc: Local -S history shows recent inbound metadata release/baseline and scrub work adjacent to the prompt contract under review. (role: recent adjacent contributor; confidence: low; commits: 2e08f0f4221f, 6437aa8532f6; files: src/auto-reply/reply/inbound-meta.ts, src/auto-reply/reply/get-reply-run.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against e0018382eb00.

anagnorisis2peripeteia added a commit to anagnorisis2peripeteia/openclaw that referenced this pull request May 19, 2026
The cron `wake` MCP tool currently forwards only `{mode, text}` to the
gateway. Every wake then enqueues a system event with no sessionKey /
agentId, so the cron service falls back to the heartbeat / main
default. Wakes scheduled from a non-main session (Telegram thread,
Discord channel, multi-agent setup) silently route to the wrong
conversation lane — and on CLI runtimes the woken session burns
tokens generating output that no caller can route back.

Origin capture (closes the upstream half of openclaw#46886 and openclaw#64556):

  - `src/agents/tools/cron-tool.ts` — the `wake` case now resolves
    `opts.agentSessionKey` through `resolveInternalSessionKey` and
    `resolveSessionAgentId`, matching the existing `add` action at
    L569-583. Explicit `sessionKey` / `agentId` params on the tool
    call take precedence over the inferred values so cross-session
    wakes remain expressible.
  - `src/gateway/protocol/schema/agent.ts` — `WakeParamsSchema` now
    declares optional `sessionKey` and `agentId` (NonEmptyString)
    so the gateway-level validator types them explicitly. The
    schema's `additionalProperties: true` continues to accept
    forward-compat metadata unchanged.
  - `src/gateway/server-methods/cron.ts` — the wake handler reads
    both fields, trims them, and forwards to `context.cron.wake`.
  - `src/cron/service.ts` + `src/cron/service/ops.ts` +
    `src/cron/service/timer.ts` — `wake()` accepts optional
    `sessionKey`/`agentId` and threads them into
    `enqueueSystemEvent` and `requestHeartbeatNow`. Missing /
    whitespace-only fields fall through to the dep's default so
    pre-existing call sites with no origin keep behaving the same
    way (backwards compatible).

Tests:

  - `src/cron/service/wake-origin.test.ts` (new, 6 tests) —
    direct seams on `wake()`: origin forwarded on `mode: "now"`,
    queued on `mode: "next-heartbeat"`, default fallback when
    origin omitted, whitespace-only origin treated as omitted,
    empty text still rejected.
  - `src/gateway/protocol/index.test.ts` — extends
    `validateWakeParams` coverage with the new optional fields
    accepted + empty-string rejected.

Out of scope (deliberate split):

  - Channel/thread/topic capture on the job's `delivery` block —
    follow-up PR once this contract lands. The minimum-viable fix
    here is session routing, which unblocks the CLI runtime's
    `--resume <session>` path and the embedded session-resolution
    path without a schema rewrite of the delivery contract.
  - The 5 stalled wake-related PRs (openclaw#70268, openclaw#57199, openclaw#82767,
    openclaw#79869, openclaw#63096) each fix downstream specifics. This PR fixes
    the upstream origin-capture they all silently assume.
anagnorisis2peripeteia added a commit to anagnorisis2peripeteia/openclaw that referenced this pull request May 19, 2026
The cron `wake` MCP tool currently forwards only `{mode, text}` to the
gateway. Every wake then enqueues a system event with no sessionKey /
agentId, so the cron service falls back to the heartbeat / main
default. Wakes scheduled from a non-main session (Telegram thread,
Discord channel, multi-agent setup) silently route to the wrong
conversation lane — and on CLI runtimes the woken session burns
tokens generating output that no caller can route back.

Origin capture (closes the upstream half of openclaw#46886 and openclaw#64556):

  - `src/agents/tools/cron-tool.ts` — the `wake` case now resolves
    `opts.agentSessionKey` through `resolveInternalSessionKey` and
    `resolveSessionAgentId`, matching the existing `add` action at
    L569-583. Explicit `sessionKey` / `agentId` params on the tool
    call take precedence over the inferred values so cross-session
    wakes remain expressible.
  - `src/gateway/protocol/schema/agent.ts` — `WakeParamsSchema` now
    declares optional `sessionKey` and `agentId` (NonEmptyString)
    so the gateway-level validator types them explicitly. The
    schema's `additionalProperties: true` continues to accept
    forward-compat metadata unchanged.
  - `src/gateway/server-methods/cron.ts` — the wake handler reads
    both fields, trims them, and forwards to `context.cron.wake`.
  - `src/cron/service.ts` + `src/cron/service/ops.ts` +
    `src/cron/service/timer.ts` — `wake()` accepts optional
    `sessionKey`/`agentId` and threads them into
    `enqueueSystemEvent` and `requestHeartbeatNow`. Missing /
    whitespace-only fields fall through to the dep's default so
    pre-existing call sites with no origin keep behaving the same
    way (backwards compatible).

Tests:

  - `src/cron/service/wake-origin.test.ts` (new, 6 tests) —
    direct seams on `wake()`: origin forwarded on `mode: "now"`,
    queued on `mode: "next-heartbeat"`, default fallback when
    origin omitted, whitespace-only origin treated as omitted,
    empty text still rejected.
  - `src/gateway/protocol/index.test.ts` — extends
    `validateWakeParams` coverage with the new optional fields
    accepted + empty-string rejected.

Out of scope (deliberate split):

  - Channel/thread/topic capture on the job's `delivery` block —
    follow-up PR once this contract lands. The minimum-viable fix
    here is session routing, which unblocks the CLI runtime's
    `--resume <session>` path and the embedded session-resolution
    path without a schema rewrite of the delivery contract.
  - The 5 stalled wake-related PRs (openclaw#70268, openclaw#57199, openclaw#82767,
    openclaw#79869, openclaw#63096) each fix downstream specifics. This PR fixes
    the upstream origin-capture they all silently assume.
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. labels May 19, 2026
@openclaw-barnacle openclaw-barnacle Bot added the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 19, 2026
anagnorisis2peripeteia added a commit to anagnorisis2peripeteia/openclaw that referenced this pull request May 19, 2026
The cron `wake` MCP tool currently forwards only `{mode, text}` to the
gateway. Every wake then enqueues a system event with no sessionKey /
agentId, so the cron service falls back to the heartbeat / main
default. Wakes scheduled from a non-main session (Telegram thread,
Discord channel, multi-agent setup) silently route to the wrong
conversation lane — and on CLI runtimes the woken session burns
tokens generating output that no caller can route back.

Origin capture (closes the upstream half of openclaw#46886 and openclaw#64556):

  - `src/agents/tools/cron-tool.ts` — the `wake` case now resolves
    `opts.agentSessionKey` through `resolveInternalSessionKey` and
    `resolveSessionAgentId`, matching the existing `add` action at
    L569-583. Explicit `sessionKey` / `agentId` params on the tool
    call take precedence over the inferred values so cross-session
    wakes remain expressible.
  - `src/gateway/protocol/schema/agent.ts` — `WakeParamsSchema` now
    declares optional `sessionKey` and `agentId` (NonEmptyString)
    so the gateway-level validator types them explicitly. The
    schema's `additionalProperties: true` continues to accept
    forward-compat metadata unchanged.
  - `src/gateway/server-methods/cron.ts` — the wake handler reads
    both fields, trims them, and forwards to `context.cron.wake`.
  - `src/cron/service.ts` + `src/cron/service/ops.ts` +
    `src/cron/service/timer.ts` — `wake()` accepts optional
    `sessionKey`/`agentId` and threads them into
    `enqueueSystemEvent` and `requestHeartbeatNow`. Missing /
    whitespace-only fields fall through to the dep's default so
    pre-existing call sites with no origin keep behaving the same
    way (backwards compatible).

Tests:

  - `src/cron/service/wake-origin.test.ts` (new, 6 tests) —
    direct seams on `wake()`: origin forwarded on `mode: "now"`,
    queued on `mode: "next-heartbeat"`, default fallback when
    origin omitted, whitespace-only origin treated as omitted,
    empty text still rejected.
  - `src/gateway/protocol/index.test.ts` — extends
    `validateWakeParams` coverage with the new optional fields
    accepted + empty-string rejected.

Out of scope (deliberate split):

  - Channel/thread/topic capture on the job's `delivery` block —
    follow-up PR once this contract lands. The minimum-viable fix
    here is session routing, which unblocks the CLI runtime's
    `--resume <session>` path and the embedded session-resolution
    path without a schema rewrite of the delivery contract.
  - The 5 stalled wake-related PRs (openclaw#70268, openclaw#57199, openclaw#82767,
    openclaw#79869, openclaw#63096) each fix downstream specifics. This PR fixes
    the upstream origin-capture they all silently assume.
@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@openclaw-barnacle

Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added the stale Marked as stale due to inactivity label Jun 5, 2026
@openclaw-barnacle

Copy link
Copy Markdown

Closing due to inactivity.
If you believe this PR should be revived, post in #clawtributors on Discord to talk to a maintainer.
That channel is the escape hatch for high-quality PRs that get auto-closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gateway Gateway runtime merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. P2 Normal backlog priority with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: M stale Marked as stale due to inactivity status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

System prompt assembled differently across code paths (chat/heartbeat/announce), causing continuous Anthropic cache invalidation

1 participant