Skip to content

[Bug]: Gemini 3.x silent hang on subagent flows — thoughtSignature dropped on cross-provider replay (2026.5.3-1) #77566

@jrex-jooni

Description

@jrex-jooni

Summary

google/gemini-3.1-pro-preview (thinkingLevel: high) reliably hangs after a few tool-call rounds in any flow where conversation history was authored by a different provider/model — e.g. subagents spawned from a Claude or GPT orchestrator. Failure mode: gateway logs LLM idle timeout (600s): no response from model and the run dies. Trivial single-turn calls to the same model with the same key work in 3 seconds.

This is the bug reported in #74244 and #72127 (and several earlier #58235, #63397). All were autoclosed by the clawsweeper bot citing a "no longer reproducible" or "already fixed" rationale. Original reporter @YouFoundJK pushed back in #74244 with a screenshot reproduction on gemini-3.1-pro-preview and posted a verified-working blueprint fix on 2026-04-29. No human maintainer ever responded; no PR was opened. The bug persists in 2026.5.3-1, which is the current latest stable release.

Status in shipped 2026.5.3-1

The exact code path @YouFoundJK identified as the problem is still present in the shipped dist:

// /app/dist/transport-stream-8H4N10uL.js:211, 231
...isSameProviderAndModel && block.textSignature ? { thoughtSignature: block.textSignature } : {}
...isSameProviderAndModel && block.thoughtSignature ? { thoughtSignature: block.thoughtSignature } : {}

isSameProviderAndModel gates signature forwarding. For any conversation where assistant turns were authored by a different provider or model than the one currently being called (subagents, model switches, fallback chains), thoughtSignature is silently dropped from the outbound functionCall parts. Gemini 3.x then hangs (per Google docs, missing-signature should be a 400 INVALID_ARGUMENT, but in practice it manifests as a silent stall).

The recommended fallback string "skip_thought_signature_validator" (per Google's thought-signatures docs) is not present anywhere in /app/dist/ of 2026.5.3-1, so neither the original-signature path nor the fallback path covers cross-provider replay.

Reproduction fingerprint (5 captured runs)

Five subagent runs on google/gemini-3.1-pro-preview with thinkingLevel: high and a multi-step exec-tool task. All five hung identically. Captured directly from gateway-persisted JSONL transcripts:

run assistant turns tool calls thinking blocks blocks with thoughtSignature <think> text tags terminal error
1 3 3 2 0 0 LLM idle timeout (600s)
2 4 4 1 0 0 LLM idle timeout (600s)
3 3 2 2 0 0 request timed out
4 15 15 4 0 0 LLM idle timeout (600s)
5 5 5 2 0 0 LLM idle timeout (600s)

Persisted gateway error record from one run:

{
  "type": "custom",
  "customType": "openclaw:prompt-error",
  "data": {
    "provider": "google",
    "model": "gemini-3.1-pro-preview",
    "api": "google-generative-ai",
    "error": "LLM idle timeout (600s): no response from model"
  }
}

Every persisted thinking content block in the failing transcripts has type: "thinking" (native, not text-tagged), 300–600 chars of text, and no signature / thoughtSignature / thinkingSignature field on the persisted block.

What was checked / ruled out

This narrows the failure to the Gemini transport's outbound serialization of cross-provider assistant history — exactly what @YouFoundJK identified.

Asks

  1. Re-open [Bug]: Gemini API error (400): missing thought_signature #74244 (or supersede with this one). The clawsweeper autoclose was incorrect — code review missed the isSameProviderAndModel gate that strips signatures on cross-provider replay, and ignored the original reporter's request for human verification. Consider requiring a human re-check before autoclosing Gemini 3.x bug reports going forward.
  2. Land @YouFoundJK's blueprint fix (or an equivalent maintainer-cleaned version) covering all three gaps:
    • transport-message-transform.ts — stop stripping thoughtSignature from cross-provider history; let downstream transports decide.
    • transport-stream.ts — always include thoughtSignature on functionCall parts; use "skip_thought_signature_validator" fallback (per Google docs) when no captured signature is available.
    • openai-transport-stream.ts — collect signatures from all history (not just same-model).
  3. Make the failure mode loud. A silent 600 s idle timeout with no diagnostic about a known protocol gap is very hard to root-cause downstream. Either surface Google's 400 INVALID_ARGUMENT: missing thought_signature to the gateway log/error path, or fast-fail when an outbound Gemini call would emit a functionCall part with no signature.

Environment

  • OpenClaw 2026.5.3-1 (release tag, latest stable as of 2026-05-04)
  • google/gemini-3.1-pro-preview via Google AI Studio API key (no Vertex, no proxy)
  • Linux container, Node 24.14.0
  • thinkingDefault: medium globally; the affected reviewer agents run with thinkingLevel: high

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions