Skip to content

fix(gateway): seed subagent session deliveryContext from spawn request params#54479

Closed
mariosousa-finn wants to merge 3 commits into
openclaw:mainfrom
mariosousa-finn:fix/subagent-spawn-delivery-context
Closed

fix(gateway): seed subagent session deliveryContext from spawn request params#54479
mariosousa-finn wants to merge 3 commits into
openclaw:mainfrom
mariosousa-finn:fix/subagent-spawn-delivery-context

Conversation

@mariosousa-finn

@mariosousa-finn mariosousa-finn commented Mar 25, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Problem: Subagent sessions are created with an incomplete deliveryContext — only {"channel": "slack"} with no to or threadId. When the subagent completes and the announce mechanism tries to deliver results back to the requester, it must reconstruct the delivery target from the parent session's mutable lastTo/lastThreadId fields, which may have drifted if the parent handled other messages in the meantime.
  • Why it matters: On a production Slack gateway (Edgar, OpenClaw 2026.3.23-2), this causes 3 distinct failure modes from the same root cause:
    1. ✅ Correct delivery (parent entry unchanged — most common)
    2. Wrong-channel delivery — a DM answer appears in a channel, or a thread reply lands in the wrong thread (e.g., Feb 16 incident: DM answer leaked to #temp_edgar_shelter)
    3. Total delivery failure"Outbound not configured for channel: slack" when the channel plugin registry is unavailable at announce time (3 lost messages on 2026-03-25 alone)
  • Root cause: In server-methods/agent.ts, the nextEntryPatch builds deliveryContext exclusively from normalizeSessionDeliveryFields(entry), which reads from the existing session entry. For a freshly-spawned subagent (entry is null/empty), this returns undefined for to/threadId. The request's channel/to/threadId params — passed by spawnSubagentDirect to indicate where announce results should go — are not persisted to the session store because deliver: false skips the updateLastRoute outbound path.
  • What changed: After computing deliveryFields from the existing entry, we now merge in a requestDeliveryHint built from the request's channel/to/accountId/threadId params using the existing mergeDeliveryContext helper. The existing entry's fields take priority (primary wins), so this only fills in missing values — it never overwrites an established delivery route.
  • What did NOT change (scope boundary): No changes to spawnSubagentDirect, registerSubagentRun, the announce delivery pipeline, resolveAnnounceOrigin, mergeDeliveryContext, or deliveryContextFromSession. The fix is entirely in the gateway's agent request handler session entry initialization.
  • Relationship to fix(gateway): pin channel registry at startup to survive registry swaps #53944: PR fix(gateway): pin channel registry at startup to survive registry swaps #53944 fixed failure mode 3 (channel plugin registry eviction) by pinning the registry at startup. This PR fixes the upstream cause — the incomplete deliveryContext that forces the announce mechanism to rely on mutable parent session state in the first place. Both fixes are complementary.

🤖 AI-assisted

  • Marked as AI-assisted
  • Degree of testing: fully tested (3 new tests + 133 existing gateway/agent/session/channel-pin tests pass)
  • I understand what the code does
  • Bot review conversations will be resolved

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

Subagent sessions spawned via sessions_spawn now have a complete deliveryContext (including to and threadId) frozen at spawn time. The announce delivery mechanism no longer needs to fall back to the parent session's mutable lastTo/lastThreadId fields to determine where to route results.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: Linux (GCE, Debian 12) / macOS
  • Runtime: Node 22
  • OpenClaw: 2026.3.23-2
  • Channel: Slack (Socket Mode)

Steps

  1. Start a gateway with Slack enabled
  2. In a Slack channel thread, send a message that triggers a subagent spawn
  3. While the subagent is running, send another message in a different thread in the same channel
  4. Wait for the subagent to complete and announce

Expected (after fix)

  • The subagent's result is delivered to the original thread where the question was asked
  • The subagent session entry has deliveryContext: {channel: "slack", to: "channel:C...", threadId: "original.thread.ts"}

Actual (before fix)

  • The subagent session entry has deliveryContext: {channel: "slack"} — missing to and threadId
  • The announce mechanism reads the parent's lastTo/lastThreadId which now point to the different thread
  • The result is delivered to the wrong thread (or fails entirely if the channel plugin registry is unavailable)

Evidence from production (2026-03-25)

Three subagent announces failed with "Outbound not configured for channel: slack" at 09:17, 10:47, and 10:51 UTC. All three subagents had deliveryContext: {"channel": "slack"} with no to or threadId. The parent sessions had correct, complete delivery contexts.

Test Plan

  • server.agent.subagent-delivery-context.test.ts — 3 new tests:
    1. New subagent session inherits deliveryContext from request params — verifies to/threadId/accountId are persisted
    2. Existing session deliveryContext is NOT overwritten — verifies mergeDeliveryContext primary-wins semantics
    3. Request without to/threadId does not inject empty values — verifies no regression for internal-only sessions
  • All 30 existing server.agent.gateway-server-agent-{a,b} tests pass
  • All 9 channel-pin tests from fix(gateway): pin channel registry at startup to survive registry swaps #53944 pass
  • All 84 gateway session/chat tests pass
  • All 10 subagent spawn tests pass

…t params

When a subagent is spawned via callGateway({method: 'agent', deliver: false}),
the gateway creates/updates the child session entry but only seeds
deliveryContext from the existing entry (which is null for new sessions).
The request's channel/to/threadId params — passed by the spawner to indicate
where announce results should be delivered — are not persisted.

This causes all subagent sessions to end up with:
  deliveryContext: {channel: 'slack'}
missing the critical 'to' and 'threadId' fields.

When the subagent completes and the announce mechanism fires, it falls back to
resolveAnnounceOrigin → mergeDeliveryContext with the parent session's
lastTo/lastThreadId, which may have drifted if the parent handled other
messages. This produces three failure modes:
1. Correct delivery (parent entry unchanged — most common)
2. Wrong-channel delivery (parent lastTo/lastThreadId drifted)
3. Total delivery failure ('Outbound not configured for channel: slack')

Fix: merge the request's channel/to/accountId/threadId as a fallback into the
session entry's deliveryContext using the existing mergeDeliveryContext helper.
The existing entry's fields take priority (primary wins in merge), so this only
fills in missing values — it never overwrites an established delivery route.
@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime size: M labels Mar 25, 2026
@greptile-apps

greptile-apps Bot commented Mar 25, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a real production regression on the Slack gateway where subagent sessions were spawned with an incomplete deliveryContext — only {"channel": "slack"}, missing to and threadId. The announce mechanism was then forced to read the parent session's mutable lastTo/lastThreadId at completion time, which could have drifted, causing wrong-channel delivery or total delivery failure.

The fix is well-scoped: after computing deliveryFields from the existing session entry, it now builds a requestDeliveryHint from the request's channel/to/accountId/threadId params and merges it as a fallback via the existing mergeDeliveryContext helper (primary-wins, so no established delivery route is ever overwritten). The effectiveDeliveryFields then replace the former deliveryFields in nextEntryPatch.

Key observations:

  • Logic is correct and the primary-wins merge semantics are exactly right — existing sessions are never overwritten, and fresh subagent sessions now get a complete deliveryContext frozen at spawn time.
  • Three new integration tests cover all three meaningful cases (new session, existing session not clobbered, no to/threadId no-op).
  • Minor: the threadId pre-processing guard typeof request.threadId === "string" before passing to normalizeDeliveryContext silently drops numeric threadIds — normalizeDeliveryContext already handles number threadIds natively. This is consistent with an existing pattern at line 578 but is slightly more restrictive than necessary for non-Slack channels.

Confidence Score: 5/5

  • Safe to merge — the fix is targeted, well-tested, and uses existing idempotent helpers with correct primary-wins semantics.
  • The root cause is clearly identified, the implementation correctly addresses it without side effects on existing sessions, and the three new tests cover the critical branches. The one P2 nit (numeric threadId filtering before normalizeDeliveryContext) is pre-existing behavior consistent with line 578 and does not affect Slack or any currently known production use case.
  • No files require special attention.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/gateway/server-methods/agent.ts
Line: 467-471

Comment:
**Numeric `threadId` silently dropped before `normalizeDeliveryContext`**

The guard `typeof request.threadId === "string" && request.threadId.trim()` causes any numeric `threadId` from request params to be replaced with `undefined` before reaching `normalizeDeliveryContext`. However, `normalizeDeliveryContext` already knows how to handle `number` threadIds — it truncates finite numbers via `Math.trunc`. Passing `request.threadId` directly would be both simpler and correct for all channel types (e.g., Matrix uses integer thread IDs).

Note: the same pre-existing pattern appears further down in this file at line 578, so this is consistent with existing code, but both sites could benefit from the simplification.

```suggestion
        threadId: request.threadId,
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(gateway): seed subagent session deli..." | Re-trigger Greptile

Comment thread src/gateway/server-methods/agent.ts Outdated
Address review feedback: normalizeDeliveryContext already handles both
string and numeric threadIds (Math.trunc for finite numbers), so the
typeof string guard was unnecessarily restrictive for channels that use
integer thread IDs (e.g., Matrix).
@mariosousa-finn

Copy link
Copy Markdown
Contributor Author

Re: Greptile P2 — Numeric threadId silently dropped

Good catch — fixed in 34da8ff. Passing request.threadId directly now, since normalizeDeliveryContext already handles both string and numeric values via Math.trunc for finite numbers. This is correct for channels like Matrix that use integer thread IDs.

@mariosousa-finn

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Breezy!

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@TARS67

TARS67 commented Mar 26, 2026

Copy link
Copy Markdown

Confirming this fixes a real-world issue for us. We're on v2026.3.23-2 (upgraded from v2026.3.13 where everything worked) using openclaw agent --agent --deliver --channel telegram --reply-to --reply-account triggered from external scripts.

Impact: This effectively breaks our multi-agent setup. We run 4 agents (main + 3 specialized agents with separate Telegram bots), all relying on --deliver for Telegram delivery. The primary --deliver path fails consistently since the update. The embedded fallback recovers intermittently, making delivery unreliable.

Symptoms:

Delivery failed (telegram to ): Error: Unknown channel: telegram / Outbound not configured for channel: telegram
Regression: worked on v2026.3.13, broke on v2026.3.23-2
Affects all agents
Related: #48790, #53944, #14188

@affsantos

Copy link
Copy Markdown
Contributor

We have been having this error for a while in Slack. I can confirm that after this patch the error is gone. Thank you @mariosousa-finn

…livery-context

# Conflicts:
#	src/gateway/server-methods/agent.ts
@steipete

Copy link
Copy Markdown
Contributor

Closing this as implemented after Codex review.

Current main already seeds subagent session deliveryContext from the spawn request and persists the merged routing fields, with focused regression coverage present in the tree and already shipped in v2026.4.22.

What I checked:

  • Gateway handler seeds missing delivery context from request params: agent request handling builds requestDeliveryHint from channel/to/accountId/threadId, merges it with the existing session delivery context, and uses the merged result for the session entry patch. (src/gateway/server-methods/agent.ts:647, 8f64cd3e4d83)
  • Merged delivery fields are persisted on the session entry: The patched session entry stores deliveryContext plus lastChannel/lastTo/lastAccountId/lastThreadId from the merged delivery fields, which is the behavior this PR requests. (src/gateway/server-methods/agent.ts:679, 8f64cd3e4d83)
  • Regression tests cover the reported cases: Current tests verify new subagent sessions inherit request delivery context, existing session delivery context is not overwritten, pre-patched subagent sessions are seeded correctly, and empty routing fields are not injected. (src/gateway/server.agent.subagent-delivery-context.test.ts:78, 8f64cd3e4d83)
  • Latest release already contains the implementation: The shipped v2026.4.22 release contains the same agent.ts logic and the same dedicated regression test file, so this is not just on unreleased main. (src/gateway/server-methods/agent.ts:644, 00bd2cf7a376)

So I’m closing this as already implemented rather than keeping a duplicate issue open.

Review notes: reviewed against 8f64cd3e4d83; fix evidence: release v2026.4.22, commit 00bd2cf7a376.

@steipete steipete closed this Apr 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gateway Gateway runtime size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants