Skip to content

Gateway completeness check false-negative on pure-relay agents #72541

@alfredneuworth

Description

@alfredneuworth

Gateway completeness check false-negative on pure-relay agents

This issue is being filed as part of a batch of related upstream submissions discovered during architectural review of OpenClaw deployment. Filed locally 2026-04-22; submitted upstream 2026-04-27. See "Related issues (batch)" section at end for batch context.

Suggested labels (Mike to choose from labels actually available in target repo at submission time):
bug, severity:medium, gateway, completeness-check, pure-relay-agents

Severity: MEDIUM — produces false-failure log noise; previously caused retry cascades until local AGENTS.md workaround applied 2026-04-24
Upstream project: OpenClaw gateway (module: agent completeness check)

Summary

The OpenClaw gateway's agent completeness check treats a zero-text-payload turn that ends with a successful sessions_spawn tool call as an incomplete turn and surfaces ⚠️ Agent couldn't generate a response to the caller. For a pure-relay dispatch agent whose entire correct behaviour is "call sessions_spawn runtime=acp and stop," this is a false-negative. The gateway's heuristic was designed for user-facing assistants, where a silent stop genuinely indicates a failure; applied to a pure-dispatch relay, it misclassifies correct SOP as error.

The motivating agent is Claudette (agents/claudette/AGENTS.md), whose sole job is to forward every message to Claude Code via ACP. Seven observed false-failure instances on 2026-04-22 produced retry cascades through projects/arch-audit-dispatch/dispatch-arch-audit.sh, creating duplicate Claude Code sessions and at least one misleading fallback audit filename. A tactical local workaround (AGENTS.md edit requiring Claudette to emit a one-word ack after the spawn) was deployed 2026-04-24 04:27 AEST to decouple dispatcher retry-logic from the completeness-check output, and stopped the retry cascades. The underlying gateway false-negative still fires on every Claudette dispatch (observed through 2026-04-25 03:10 AEST), but no longer triggers cascading failures downstream.

Environment

  • Host: DESKTOP-CB8PBCM (Ubuntu 24.04, WSL2 on Windows 11 Pro)
  • OpenClaw version: 2026.4.x (gateway bound to loopback, standard deployment)
  • Agent runtime: Sonnet 4.6 for Claudette; ACP plugin spawns Claude Code as the downstream worker
  • Claudette AGENTS.md invariants:
    • Pure relay: every received message → one sessions_spawn call → turn ends.
    • No file I/O. No other tool calls.
    • Receives Sonnet-tier advisory budget (cost ≈$0.001 per dispatch).
  • Observed at: 2026-04-22 14:00-19:30 AEST window (seven instances), and continuously since on every arch-audit and heimdall-pull-audit dispatch.

Observed behaviour

Claudette receives a dispatch message. Sonnet session:

  1. Emits a thinking block (internal reasoning about the spawn).
  2. Calls sessions_spawn with runtime: "acp", agentId: "claude", mode: "run", task: <relayed brief>.
  3. Receives {status: "accepted", childSessionKey: "agent:claude:subagent:<uuid>", runId: "<uuid>"}.
  4. Ends turn. No text payload emitted.

Gateway logs:

[agent/embedded] incomplete turn detected: runId=<uuid> sessionId=<uuid> stopReason=stop payloads=0 — surfacing error to user
⚠️ Agent couldn't generate a response. Note: some tool actions may have already been executed — please verify before retrying.

This warning propagates into the dispatch script's stdout stream:

2026-04-24T20:15:51+10:00 —   [audit-FILE-CHANGE-dispatch-briefing-sh] ⚠️ Agent couldn't generate a response.
Note: some tool actions may have already been executed — please verify before retrying.

Evidence — original seven instances (2026-04-22)

Retry-cascade evidence from friday-reaudit-input/2026-04-22-claudette-completeness-check-coupling.md:

Time (AEST) Brief Gateway says Reality (Claude Code session + audit outcome)
16:00 FILE-CHANGE-dispatch-arch-audit-sh couldn't generate Claude Code session 2d7d2d03 wrote 10,721B audit at 16:04
16:05 RETROACTIVE-HIGH-IMPACT attempt 1 couldn't generate Claude Code session a4b9fd4f wrote 20,322B audit at 16:10
16:15 RETROACTIVE-HIGH-IMPACT attempt 2 (duplicate work) couldn't generate Claude Code session d156eef8 wrote 22,900B audit at 16:21
18:51 PHASE-2-AUDIT-2026-04-22 attempt 1 couldn't generate Claude Code session 596eb3ec wrote 33,201B audit at 18:57
19:01 PHASE-2-AUDIT-2026-04-22 attempt 2 (duplicate) couldn't generate Claude Code session 3ea94610 wrote 32,399B audit at 19:09
19:11 PHASE-2-AUDIT-2026-04-22 attempt 3 (duplicate) couldn't generate Claude Code session b123ff71 wrote 31,208B audit at 19:15
19:21 PHASE-2-AUDIT-2026-04-22 retry (duplicate) couldn't generate Claude Code session a7877865 wrote 34,153B audit at 19:26

Every dispatch succeeded. Every Claude Code session executed real work. The gateway surface read as failure on every one. The _attempts counter burned 3 retries per audit, producing 4 Claude Code runs instead of 1 for the PHASE-2 audit alone.

Evidence — post-workaround continuation (2026-04-24/25)

Tactical workaround applied 2026-04-24 04:27 AEST: agents/claudette/AGENTS.md updated to require Claudette emit the single word Dispatched. after sessions_spawn returns, before ending the turn. This satisfies the completeness check's payloads > 0 condition.

Empirical check via projects/arch-audit-dispatch/arch-audit.log (sample of 10 recent dispatches post-workaround):

2026-04-24T20:15:51+10:00 —   [audit-FILE-CHANGE-dispatch-briefing-sh] ⚠️ Agent couldn't generate a response.
2026-04-24T20:20:52+10:00 —   [audit-FILE-CHANGE-dispatch-tasks-sh] ⚠️ Agent couldn't generate a response.
2026-04-24T20:25:36+10:00 —   [audit-FILE-CHANGE-config-state-ownership-json] ⚠️ Agent couldn't generate a response.
2026-04-24T20:28:19+10:00 —   [audit-FILE-CHANGE-dispatch-arch-audit-sh] ⚠️ Agent couldn't generate a response.
2026-04-25T02:16:06+10:00 —   [audit-FILE-CHANGE-config-state-ownership-json] ⚠️ Agent couldn't generate a response.
2026-04-25T02:21:06+10:00 —   [audit-FILE-CHANGE-scripts-lib-state-ownership-guard-test-js] ⚠️ Agent couldn't generate a response.
2026-04-25T02:23:58+10:00 —   [audit-FILE-CHANGE-skills-inbox-triage-SKILL-md] ⚠️ Agent couldn't generate a response.
2026-04-25T02:40:51+10:00 —   [audit-FILE-CHANGE-config-state-ownership-json] ⚠️ Agent couldn't generate a response.
2026-04-25T02:45:48+10:00 —   [audit-FILE-CHANGE-docs-infrastructure-registry-md] ⚠️ Agent couldn't generate a response.
2026-04-25T03:10:51+10:00 —   [audit-FILE-CHANGE-dispatch-briefing-sh] ⚠️ Agent couldn't generate a response.

All 10 dispatches log the warning. All 10 completed successfully at the dispatcher layer:

2026-04-24T20:15:51+10:00 —   Audit dispatch for FILE-CHANGE-dispatch-briefing-sh completed successfully (47s)

All 10 have _attempts: 1 in their processed brief files (verified via node -e inspection of state/audit/pending-arch-audits/processed/FILE-CHANGE-*.json). No retries.

Interpretation: the ack workaround suppressed the retry cascade (dispatcher sees the ack text and treats the turn as succeeded despite the warning), but the gateway completeness check is STILL firing the false-negative warning on every Claudette dispatch. The ack doesn't prevent the warning — it decouples the dispatcher logic from it. The gateway is still mis-classifying Claudette's turn.

Why this is a gateway defect (not a Claudette defect)

Claudette's SOP is correct and minimal by design:

  1. She is a pure-relay agent — her entire contract is "receive message, spawn worker, stop." Any closing text violates her minimal-SOP principle.
  2. sessions_spawn returning {status: "accepted"} is a correctly-formed, successful tool result per OpenClaw's own sessions_spawn contract. The spawn succeeded; the child session is running; no caller action is needed.
  3. The ACP runtime Claude Code is executing the brief in the background. The spawned work is real; the parent's turn-completion is orthogonal to it.
  4. The gateway's completeness-check heuristic was designed for user-facing assistant patterns, where a zero-payload stop genuinely is a failure. For pure-relay agents where zero payload IS the SOP, the heuristic mis-fires deterministically.

The mechanism that misfires today on Claudette will misfire on every future pure-relay or pure-tool-use agent we build. Fixing Claudette (by adding the ack) fixes one symptom; the next such agent will hit the same bug unless the gateway's completeness check is extended.

Proposed fix

Extend the gateway's completeness-check logic so that a turn ending with a successful sessions_spawn tool result is not flagged as incomplete, regardless of payload count. "Successful" = the spawn returned status: accepted and a valid childSessionKey.

Two implementation shapes that would both work:

(A) Config flag — add completenessCheck.allowSpawnTerminalStop: true (or similar name) to the gateway config. Agents that want the new behaviour opt in; legacy user-facing assistants keep the existing heuristic. Matches OpenClaw's existing config-as-behaviour pattern.

(B) First-class gateway behaviour — detect sessions_spawn tool results in the turn's tool-use stream at completeness-check time. If present and status: accepted, skip the payloads > 0 check. No config flag; applies to all agents.

Recommendation: (B). A correctly-returned sessions_spawn acceptance is semantically a successful terminal action — there's no reason a user-facing assistant should treat it differently than a pure-relay agent. A user-facing agent that spawns and then stops has also done its job (e.g. "I've started X for you"); the completeness check should recognise the spawn as the completion.

Invariant to preserve: a turn that ends without ANY successful tool call AND zero payloads should still be flagged. The fix adjusts the zero-payload check, not the broader "empty-turn" detection.

Workaround status

Currently deployed: agents/claudette/AGENTS.md ack requirement (one-word Dispatched. after spawn). This is F11-A / F11-C per docs/rebuild-stage-b-solution-space-2026-04-23.md §F11. The ack is:

  • Effective at decoupling retries: dispatcher sees the ack text, treats turn as complete, no retries fire.
  • Non-invasive: adds ~$0.001 per dispatch (one Sonnet inference emitting one token).
  • Reversible: a three-line edit to remove when the gateway fix lands.
  • A workaround, not a fix: the underlying completeness-check misfire still happens on every dispatch; warnings still accumulate in logs; noise-to-signal ratio in arch-audit.log degrades linearly with dispatch volume.

On upstream resolution of the gateway defect, revert agents/claudette/AGENTS.md to the prior minimal SOP (pure spawn-and-stop).

Request for upstream review

Please consider:

  1. Reviewing the proposed fix (A) or (B) for fit with OpenClaw's completeness-check design.
  2. Confirming whether sessions_spawn acceptance should be treated as a terminal success signal globally, or whether an opt-in flag is preferable.
  3. Advising whether the ack workaround is safe to leave in place indefinitely (I believe yes; it costs negligible tokens and introduces no correctness risk).
  4. Flagging any other pure-relay or pure-tool-use patterns in OpenClaw's agent library that may have been silently affected (I haven't audited all agents; Claudette is the one I observed because she's dispatched by a script that surfaces the warning).

Data available on request

  • Full friday-reaudit-input/2026-04-22-claudette-completeness-check-coupling.md (original read-only finding, 2026-04-22 20:30 AEST).
  • All 10+ processed/FILE-CHANGE-*.json brief files showing _attempts: 1 post-workaround.
  • projects/arch-audit-dispatch/arch-audit.log with timestamped warning/success interleaving.
  • Sample Claudette session transcripts showing the exact tool-call stream (via openclaw sessions lookup).
  • OpenClaw version + gateway config.

Related issues (batch)

This issue is filed as part of a batch of related upstream submissions:

  • [anthropics/claude-code#53710] F12-A: Glob tool false-negative on recently-created files — Anthropic Claude Code repo
  • [#72541] F11-C: Gateway completeness check false-negative on pure-relay agents — OpenClaw repo
  • [#72540] Opus 4.7: supportsAdaptiveThinking allowlist missing claude-opus-4-7 — OpenClaw repo
  • [#72539] Sub-B-FAR: Plugin SDK MessagePreSent outbound hook missing — OpenClaw repo

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Normal backlog priority with limited blast radius.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions