Gateway completeness check false-negative on pure-relay agents

# Gateway completeness check false-negative on pure-relay agents

*This issue is being filed as part of a batch of related upstream submissions discovered during architectural review of OpenClaw deployment. Filed locally 2026-04-22; submitted upstream 2026-04-27. See "Related issues (batch)" section at end for batch context.*

*Suggested labels (Mike to choose from labels actually available in target repo at submission time):*
*bug, severity:medium, gateway, completeness-check, pure-relay-agents*

**Severity:** MEDIUM — produces false-failure log noise; previously caused retry cascades until local AGENTS.md workaround applied 2026-04-24
**Upstream project:** OpenClaw gateway (module: agent completeness check)

## Summary

The OpenClaw gateway's agent completeness check treats a zero-text-payload turn that ends with a successful `sessions_spawn` tool call as an incomplete turn and surfaces `⚠️ Agent couldn't generate a response` to the caller. For a **pure-relay dispatch agent** whose entire correct behaviour is "call `sessions_spawn runtime=acp` and stop," this is a false-negative. The gateway's heuristic was designed for user-facing assistants, where a silent stop genuinely indicates a failure; applied to a pure-dispatch relay, it misclassifies correct SOP as error.

The motivating agent is Claudette (`agents/claudette/AGENTS.md`), whose sole job is to forward every message to Claude Code via ACP. Seven observed false-failure instances on 2026-04-22 produced retry cascades through `projects/arch-audit-dispatch/dispatch-arch-audit.sh`, creating duplicate Claude Code sessions and at least one misleading fallback audit filename. A tactical local workaround (AGENTS.md edit requiring Claudette to emit a one-word ack after the spawn) was deployed 2026-04-24 04:27 AEST to decouple dispatcher retry-logic from the completeness-check output, and stopped the retry cascades. The underlying gateway false-negative still fires on every Claudette dispatch (observed through 2026-04-25 03:10 AEST), but no longer triggers cascading failures downstream.

## Environment

- **Host:** DESKTOP-CB8PBCM (Ubuntu 24.04, WSL2 on Windows 11 Pro)
- **OpenClaw version:** 2026.4.x (gateway bound to loopback, standard deployment)
- **Agent runtime:** Sonnet 4.6 for Claudette; ACP plugin spawns Claude Code as the downstream worker
- **Claudette AGENTS.md invariants:**
  - Pure relay: every received message → one `sessions_spawn` call → turn ends.
  - No file I/O. No other tool calls.
  - Receives Sonnet-tier advisory budget (cost ≈$0.001 per dispatch).
- **Observed at:** 2026-04-22 14:00-19:30 AEST window (seven instances), and continuously since on every arch-audit and heimdall-pull-audit dispatch.

## Observed behaviour

Claudette receives a dispatch message. Sonnet session:

1. Emits a `thinking` block (internal reasoning about the spawn).
2. Calls `sessions_spawn` with `runtime: "acp"`, `agentId: "claude"`, `mode: "run"`, `task: <relayed brief>`.
3. Receives `{status: "accepted", childSessionKey: "agent:claude:subagent:<uuid>", runId: "<uuid>"}`.
4. Ends turn. No text payload emitted.

Gateway logs:

```
[agent/embedded] incomplete turn detected: runId=<uuid> sessionId=<uuid> stopReason=stop payloads=0 — surfacing error to user
⚠️ Agent couldn't generate a response. Note: some tool actions may have already been executed — please verify before retrying.
```

This warning propagates into the dispatch script's stdout stream:

```
2026-04-24T20:15:51+10:00 —   [audit-FILE-CHANGE-dispatch-briefing-sh] ⚠️ Agent couldn't generate a response.
Note: some tool actions may have already been executed — please verify before retrying.
```

## Evidence — original seven instances (2026-04-22)

Retry-cascade evidence from `friday-reaudit-input/2026-04-22-claudette-completeness-check-coupling.md`:

| Time (AEST) | Brief | Gateway says | Reality (Claude Code session + audit outcome) |
|---|---|---|---|
| 16:00 | FILE-CHANGE-dispatch-arch-audit-sh | couldn't generate | Claude Code session `2d7d2d03` wrote 10,721B audit at 16:04 |
| 16:05 | RETROACTIVE-HIGH-IMPACT attempt 1 | couldn't generate | Claude Code session `a4b9fd4f` wrote 20,322B audit at 16:10 |
| 16:15 | RETROACTIVE-HIGH-IMPACT attempt 2 (duplicate work) | couldn't generate | Claude Code session `d156eef8` wrote 22,900B audit at 16:21 |
| 18:51 | PHASE-2-AUDIT-2026-04-22 attempt 1 | couldn't generate | Claude Code session `596eb3ec` wrote 33,201B audit at 18:57 |
| 19:01 | PHASE-2-AUDIT-2026-04-22 attempt 2 (duplicate) | couldn't generate | Claude Code session `3ea94610` wrote 32,399B audit at 19:09 |
| 19:11 | PHASE-2-AUDIT-2026-04-22 attempt 3 (duplicate) | couldn't generate | Claude Code session `b123ff71` wrote 31,208B audit at 19:15 |
| 19:21 | PHASE-2-AUDIT-2026-04-22 retry (duplicate) | couldn't generate | Claude Code session `a7877865` wrote 34,153B audit at 19:26 |

Every dispatch succeeded. Every Claude Code session executed real work. The gateway surface read as failure on every one. The `_attempts` counter burned 3 retries per audit, producing 4 Claude Code runs instead of 1 for the PHASE-2 audit alone.

## Evidence — post-workaround continuation (2026-04-24/25)

Tactical workaround applied 2026-04-24 04:27 AEST: `agents/claudette/AGENTS.md` updated to require Claudette emit the single word `Dispatched.` after `sessions_spawn` returns, before ending the turn. This satisfies the completeness check's `payloads > 0` condition.

**Empirical check via `projects/arch-audit-dispatch/arch-audit.log`** (sample of 10 recent dispatches post-workaround):

```
2026-04-24T20:15:51+10:00 —   [audit-FILE-CHANGE-dispatch-briefing-sh] ⚠️ Agent couldn't generate a response.
2026-04-24T20:20:52+10:00 —   [audit-FILE-CHANGE-dispatch-tasks-sh] ⚠️ Agent couldn't generate a response.
2026-04-24T20:25:36+10:00 —   [audit-FILE-CHANGE-config-state-ownership-json] ⚠️ Agent couldn't generate a response.
2026-04-24T20:28:19+10:00 —   [audit-FILE-CHANGE-dispatch-arch-audit-sh] ⚠️ Agent couldn't generate a response.
2026-04-25T02:16:06+10:00 —   [audit-FILE-CHANGE-config-state-ownership-json] ⚠️ Agent couldn't generate a response.
2026-04-25T02:21:06+10:00 —   [audit-FILE-CHANGE-scripts-lib-state-ownership-guard-test-js] ⚠️ Agent couldn't generate a response.
2026-04-25T02:23:58+10:00 —   [audit-FILE-CHANGE-skills-inbox-triage-SKILL-md] ⚠️ Agent couldn't generate a response.
2026-04-25T02:40:51+10:00 —   [audit-FILE-CHANGE-config-state-ownership-json] ⚠️ Agent couldn't generate a response.
2026-04-25T02:45:48+10:00 —   [audit-FILE-CHANGE-docs-infrastructure-registry-md] ⚠️ Agent couldn't generate a response.
2026-04-25T03:10:51+10:00 —   [audit-FILE-CHANGE-dispatch-briefing-sh] ⚠️ Agent couldn't generate a response.
```

All 10 dispatches log the warning. All 10 completed successfully at the dispatcher layer:

```
2026-04-24T20:15:51+10:00 —   Audit dispatch for FILE-CHANGE-dispatch-briefing-sh completed successfully (47s)
```

All 10 have `_attempts: 1` in their processed brief files (verified via `node -e` inspection of `state/audit/pending-arch-audits/processed/FILE-CHANGE-*.json`). No retries.

**Interpretation**: the ack workaround suppressed the retry cascade (dispatcher sees the ack text and treats the turn as succeeded despite the warning), but the gateway completeness check is STILL firing the false-negative warning on every Claudette dispatch. The ack doesn't prevent the warning — it decouples the dispatcher logic from it. The gateway is still mis-classifying Claudette's turn.

## Why this is a gateway defect (not a Claudette defect)

Claudette's SOP is correct and minimal by design:

1. She is a pure-relay agent — her entire contract is "receive message, spawn worker, stop." Any closing text violates her minimal-SOP principle.
2. `sessions_spawn` returning `{status: "accepted"}` is a **correctly-formed, successful tool result** per OpenClaw's own sessions_spawn contract. The spawn succeeded; the child session is running; no caller action is needed.
3. The ACP runtime Claude Code is executing the brief in the background. The spawned work is real; the parent's turn-completion is orthogonal to it.
4. The gateway's completeness-check heuristic was designed for user-facing assistant patterns, where a zero-payload stop genuinely is a failure. For pure-relay agents where zero payload IS the SOP, the heuristic mis-fires deterministically.

The mechanism that misfires today on Claudette will misfire on every future pure-relay or pure-tool-use agent we build. Fixing Claudette (by adding the ack) fixes one symptom; the next such agent will hit the same bug unless the gateway's completeness check is extended.

## Proposed fix

Extend the gateway's completeness-check logic so that **a turn ending with a successful `sessions_spawn` tool result is not flagged as incomplete, regardless of payload count.** "Successful" = the spawn returned `status: accepted` and a valid `childSessionKey`.

Two implementation shapes that would both work:

**(A) Config flag** — add `completenessCheck.allowSpawnTerminalStop: true` (or similar name) to the gateway config. Agents that want the new behaviour opt in; legacy user-facing assistants keep the existing heuristic. Matches OpenClaw's existing config-as-behaviour pattern.

**(B) First-class gateway behaviour** — detect `sessions_spawn` tool results in the turn's tool-use stream at completeness-check time. If present and `status: accepted`, skip the `payloads > 0` check. No config flag; applies to all agents.

**Recommendation: (B)**. A correctly-returned `sessions_spawn` acceptance is semantically a successful terminal action — there's no reason a user-facing assistant should treat it differently than a pure-relay agent. A user-facing agent that spawns and then stops has also done its job (e.g. "I've started X for you"); the completeness check should recognise the spawn as the completion.

**Invariant to preserve**: a turn that ends without ANY successful tool call AND zero payloads should still be flagged. The fix adjusts the zero-payload check, not the broader "empty-turn" detection.

## Workaround status

Currently deployed: `agents/claudette/AGENTS.md` ack requirement (one-word `Dispatched.` after spawn). This is F11-A / F11-C per `docs/rebuild-stage-b-solution-space-2026-04-23.md` §F11. The ack is:

- **Effective at decoupling retries**: dispatcher sees the ack text, treats turn as complete, no retries fire.
- **Non-invasive**: adds ~$0.001 per dispatch (one Sonnet inference emitting one token).
- **Reversible**: a three-line edit to remove when the gateway fix lands.
- **A workaround, not a fix**: the underlying completeness-check misfire still happens on every dispatch; warnings still accumulate in logs; noise-to-signal ratio in `arch-audit.log` degrades linearly with dispatch volume.

On upstream resolution of the gateway defect, revert `agents/claudette/AGENTS.md` to the prior minimal SOP (pure spawn-and-stop).

## Request for upstream review

Please consider:

1. Reviewing the proposed fix (A) or (B) for fit with OpenClaw's completeness-check design.
2. Confirming whether `sessions_spawn` acceptance should be treated as a terminal success signal globally, or whether an opt-in flag is preferable.
3. Advising whether the ack workaround is safe to leave in place indefinitely (I believe yes; it costs negligible tokens and introduces no correctness risk).
4. Flagging any other pure-relay or pure-tool-use patterns in OpenClaw's agent library that may have been silently affected (I haven't audited all agents; Claudette is the one I observed because she's dispatched by a script that surfaces the warning).

## Data available on request

- Full `friday-reaudit-input/2026-04-22-claudette-completeness-check-coupling.md` (original read-only finding, 2026-04-22 20:30 AEST).
- All 10+ `processed/FILE-CHANGE-*.json` brief files showing `_attempts: 1` post-workaround.
- `projects/arch-audit-dispatch/arch-audit.log` with timestamped warning/success interleaving.
- Sample Claudette session transcripts showing the exact tool-call stream (via `openclaw sessions` lookup).
- OpenClaw version + gateway config.

## Related issues (batch)

This issue is filed as part of a batch of related upstream submissions:

- [`anthropics/claude-code#53710`] F12-A: Glob tool false-negative on recently-created files — Anthropic Claude Code repo
- [`#72541`] F11-C: Gateway completeness check false-negative on pure-relay agents — OpenClaw repo
- [`#72540`] Opus 4.7: supportsAdaptiveThinking allowlist missing claude-opus-4-7 — OpenClaw repo
- [`#72539`] Sub-B-FAR: Plugin SDK MessagePreSent outbound hook missing — OpenClaw repo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gateway completeness check false-negative on pure-relay agents #72541

Gateway completeness check false-negative on pure-relay agents

Summary

Environment

Observed behaviour

Evidence — original seven instances (2026-04-22)

Evidence — post-workaround continuation (2026-04-24/25)

Why this is a gateway defect (not a Claudette defect)

Proposed fix

Workaround status

Request for upstream review

Data available on request

Related issues (batch)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Time (AEST)	Brief	Gateway says	Reality (Claude Code session + audit outcome)
16:00	FILE-CHANGE-dispatch-arch-audit-sh	couldn't generate	Claude Code session `2d7d2d03` wrote 10,721B audit at 16:04
16:05	RETROACTIVE-HIGH-IMPACT attempt 1	couldn't generate	Claude Code session `a4b9fd4f` wrote 20,322B audit at 16:10
16:15	RETROACTIVE-HIGH-IMPACT attempt 2 (duplicate work)	couldn't generate	Claude Code session `d156eef8` wrote 22,900B audit at 16:21
18:51	PHASE-2-AUDIT-2026-04-22 attempt 1	couldn't generate	Claude Code session `596eb3ec` wrote 33,201B audit at 18:57
19:01	PHASE-2-AUDIT-2026-04-22 attempt 2 (duplicate)	couldn't generate	Claude Code session `3ea94610` wrote 32,399B audit at 19:09
19:11	PHASE-2-AUDIT-2026-04-22 attempt 3 (duplicate)	couldn't generate	Claude Code session `b123ff71` wrote 31,208B audit at 19:15
19:21	PHASE-2-AUDIT-2026-04-22 retry (duplicate)	couldn't generate	Claude Code session `a7877865` wrote 34,153B audit at 19:26

Uh oh!

Gateway completeness check false-negative on pure-relay agents #72541

Description

Gateway completeness check false-negative on pure-relay agents

Summary

Environment

Observed behaviour

Evidence — original seven instances (2026-04-22)

Evidence — post-workaround continuation (2026-04-24/25)

Why this is a gateway defect (not a Claudette defect)

Proposed fix

Workaround status

Request for upstream review

Data available on request

Related issues (batch)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions