Skip to content

Commit 89db7a0

Browse files
committed
fix(agents,gateway): three subagent announce delivery failures
Fixes three related bugs that break subagent completion delivery in loopback token-auth setups (e.g. Google Chat with sessions_spawn). All three were found and validated on a production Linux/GCP deployment (openclaw 2026.5.20, Node 22, systemd user service). --- ## Fix 1 — gateway: keep device identity for BACKEND calls with explicit scopes File: src/gateway/call.ts Fixes: #77807 `shouldOmitDeviceIdentityForGatewayCall` unconditionally omitted device identity for all BACKEND + GATEWAY_CLIENT + loopback calls. Internal calls such as `callSubagentGateway` carry explicit operator scopes like `["operator.write"]`. Without device identity the gateway cannot verify those scopes against the paired device token and rejects with: Subagent completion direct announce failed: missing scope: operator.write Fix: return false (keep device identity) when `params.scopes` is non-empty. Also threads `scopes` through `resolveDeviceIdentityForGatewayCall` so the helper can inspect them. --- ## Fix 2 — sessions_yield: await settle promise after timeout File: src/agents/pi-embedded-runner/run/attempt.sessions-yield.ts `waitForSessionsYieldAbortSettle` raced the settle promise against a 2 s timeout and returned immediately on timeout, leaving the session transcript file lock held. The next turn on the same persistent session (e.g. a Google Chat DM) then failed with "file lock stale", triggering a model fallback and surfacing an internal error message to end users. Fix: after logging the timeout warning, await `settlePromise.catch(() => {})` so the file lock is always released before the function returns. --- ## Fix 3 — sessions_yield: strip context message before completion announce File: src/agents/pi-embedded-runner/run/attempt.sessions-yield.ts When a new incoming message aborts an active `sessions_yield`, the `openclaw.sessions_yield` context message (containing "[Context: The previous turn ended intentionally via sessions_yield...]") remains in the session transcript. When a subagent completion announce subsequently re-runs the agent to deliver the result, the agent sees this context message and responds via `sessions_yield` again (a `custom_message`). The announce system does not recognise a `custom_message` as a visible reply and emits: Subagent announce give up: completion agent did not produce a visible reply `stripSessionsYieldArtifacts` already strips the interrupt custom type (`openclaw.sessions_yield_interrupt`) but not the context custom type (`openclaw.sessions_yield`). Fix: strip both types in the in-memory messages loop and in the fileEntries loop. --- Tested on production (Google Chat + Pipedrive sub-agent workflow): - Single sub-agent turn: delivers correctly - Two rapid consecutive messages (second arrives while sub-agent runs): both deliver, no fallback, no stale lock, no scope errors Reported-by: jailbirt <jailbirt@theeye.io>
1 parent 33df3be commit 89db7a0

2 files changed

Lines changed: 34 additions & 2 deletions

File tree

src/agents/pi-embedded-runner/run/attempt.sessions-yield.ts

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,11 @@ export async function waitForSessionsYieldAbortSettle(params: {
4040
log.warn(
4141
`sessions_yield abort settle timed out: runId=${params.runId} sessionId=${params.sessionId} timeoutMs=${SESSIONS_YIELD_ABORT_SETTLE_TIMEOUT_MS}`,
4242
);
43+
// Continue waiting for the settle to complete so the session file lock is
44+
// released before the next turn starts. Without this, the lock remains
45+
// held and the next turn fails with "file lock stale", causing model
46+
// fallback and visible error messages in the delivery channel.
47+
await params.settlePromise.catch(() => {});
4348
}
4449
}
4550

@@ -167,6 +172,20 @@ export function stripSessionsYieldArtifacts(activeSession: {
167172
strippedMessages.pop();
168173
continue;
169174
}
175+
// Also strip the sessions_yield context message. When a new incoming
176+
// message aborts an active sessions_yield, this context marker remains in
177+
// the session. If a subagent completion announce then re-runs the agent to
178+
// deliver the result, the agent sees the context message, responds via
179+
// sessions_yield again (a custom_message), and the announce system
180+
// rejects it as "completion agent did not produce a visible reply".
181+
if (
182+
last?.role === "custom" &&
183+
"customType" in last &&
184+
last.customType === SESSIONS_YIELD_CONTEXT_CUSTOM_TYPE
185+
) {
186+
strippedMessages.pop();
187+
continue;
188+
}
170189
break;
171190
}
172191
if (strippedMessages.length !== activeSession.messages.length) {
@@ -205,7 +224,9 @@ export function stripSessionsYieldArtifacts(activeSession: {
205224
last.message?.stopReason === "aborted";
206225
const isYieldInterruptMessage =
207226
last.type === "custom_message" && last.customType === SESSIONS_YIELD_INTERRUPT_CUSTOM_TYPE;
208-
if (!isYieldAbortAssistant && !isYieldInterruptMessage) {
227+
const isYieldContextMessage =
228+
last.type === "custom_message" && last.customType === SESSIONS_YIELD_CONTEXT_CUSTOM_TYPE;
229+
if (!isYieldAbortAssistant && !isYieldInterruptMessage && !isYieldContextMessage) {
209230
break;
210231
}
211232
fileEntries.pop();

src/gateway/call.ts

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -328,10 +328,20 @@ function shouldOmitDeviceIdentityForGatewayCall(params: {
328328
url: string;
329329
token?: string;
330330
password?: string;
331+
scopes?: string[];
331332
}): boolean {
332333
const mode = params.opts.mode ?? GATEWAY_CLIENT_MODES.CLI;
333334
const clientName = params.opts.clientName ?? GATEWAY_CLIENT_NAMES.CLI;
334335
const hasSharedAuth = Boolean(params.token || params.password);
336+
// When the call carries explicit operator scopes (e.g. ["operator.write"]),
337+
// keep device identity so the gateway can verify them against the paired
338+
// device token. Without this, loopback backend calls — such as subagent
339+
// completion announce — fail with "missing scope: operator.write" even when
340+
// the device has full scopes. Fixes #77807.
341+
const requestedScopes = Array.isArray(params.scopes) ? params.scopes : [];
342+
if (requestedScopes.length > 0) {
343+
return false;
344+
}
335345
return (
336346
mode === GATEWAY_CLIENT_MODES.BACKEND &&
337347
clientName === GATEWAY_CLIENT_NAMES.GATEWAY_CLIENT &&
@@ -345,6 +355,7 @@ function resolveDeviceIdentityForGatewayCall(params: {
345355
url: string;
346356
token?: string;
347357
password?: string;
358+
scopes?: string[];
348359
}): ReturnType<typeof loadOrCreateDeviceIdentity> | null {
349360
if (shouldOmitDeviceIdentityForGatewayCall(params)) {
350361
return null;
@@ -773,7 +784,7 @@ async function executeGatewayRequestWithScopes<T>(params: {
773784
scopes,
774785
deviceIdentity:
775786
opts.deviceIdentity === undefined
776-
? resolveDeviceIdentityForGatewayCall({ opts, url, token, password })
787+
? resolveDeviceIdentityForGatewayCall({ opts, url, token, password, scopes })
777788
: opts.deviceIdentity,
778789
minProtocol: opts.minProtocol ?? MIN_CLIENT_PROTOCOL_VERSION,
779790
maxProtocol: opts.maxProtocol ?? PROTOCOL_VERSION,

0 commit comments

Comments
 (0)