Summary
repairUserEntryWithBlankTextContent in OpenClaw's session-file repair returns { kind: "drop" } for user-role entries whose only text content is blank. The repair function then removes those entries from the on-disk session entirely. When session-memory rehydration produces a session whose post-repair message array is [system, asst, asst, asst, asst] (no user role at all), the array passes OpenAI's permissive Chat Completions API but is rejected by stricter providers.
With mlx-vlm serving Qwen3.6 (mlx-community/Qwen3.6-35B-A3B-8bit), the Jinja chat template raises TemplateError: No user query found in messages. and the server returns HTTP 500. The gateway then either fails over to the next provider (cloud cost + latency) or surfaces the error.
OpenClaw 2026.4.27 (same code path was present in 2026.4.26). The function lives in dist/compaction-successor-transcript-*.js.
Reproduction
# 1) seed a JSONL session matching the bug pattern: blank user + N rehydrated assistants
SESS=~/.openclaw/agents/main/sessions/repro-$(date +%s).jsonl
cat > "$SESS" <<'EOF'
{"type":"session","sessionId":"repro","createdAt":"2026-04-30T00:00:00Z"}
{"type":"model_change","model":"mlx-community/Qwen3.6-35B-A3B-8bit"}
{"type":"thinking_level_change","level":"off"}
{"type":"custom","payload":{}}
{"type":"message","message":{"role":"user","content":[{"type":"text","text":""}]}}
{"type":"message","message":{"role":"assistant","content":[{"type":"text","text":"rehydrated turn 1"}]}}
{"type":"message","message":{"role":"assistant","content":[{"type":"text","text":"rehydrated turn 2"}]}}
{"type":"message","message":{"role":"assistant","content":[{"type":"text","text":"rehydrated turn 3"}]}}
{"type":"message","message":{"role":"assistant","content":[{"type":"text","text":"rehydrated turn 4"}]}}
EOF
# 2) send a turn forcing the strict provider
openclaw agent --json --thinking off \
--model "mbpro/mlx-community/Qwen3.6-35B-A3B-8bit" \
--message "Reply with one word: ok" \
--session-id "$(basename $SESS .jsonl)"
Observed
gateway.err.log: session file repaired: rewrote 4 assistant message(s), dropped 1 blank user message(s)
- ~10 s later:
embedded run failover decision: runId=… stage=assistant decision=surface_error reason=timeout from=mbpro/mlx-community/Qwen3.6-35B-A3B-8bit profile=- rawError=500 status code (no body)
- Direct curl confirms the wire-level rejection:
$ curl -s http://<mlx-server>/v1/chat/completions \
-H 'content-type: application/json' \
-d '{"model":"mlx-community/Qwen3.6-35B-A3B-8bit","messages":[
{"role":"system","content":"helper"},
{"role":"assistant","content":"prior A"},
{"role":"assistant","content":"prior B"}],"max_tokens":4}'
{"detail":"An unexpected error occurred: No user query found in messages."}
HTTP 500
The same array with one user message appended → HTTP 200.
Expected
After the repair runs, the on-disk session should always contain at least one user-role message. Dropping the only user entry violates the hard requirements of multiple provider chat templates (Qwen3.6 confirmed; expected to surface on other strict templates over time).
Suggested fix
Smallest possible change: in repairUserEntryWithBlankTextContent, replace both { kind: "drop" } returns with { kind: "rewrite", entry: <placeholder> }. The existing repair loop already handles { kind: "rewrite", entry } correctly, so no caller changes are needed.
function repairUserEntryWithBlankTextContent(entry) {
const PLACEHOLDER_TEXT = "[Continuing previous conversation. Please proceed.]";
const placeholder = () => ({
...entry,
message: {
...entry.message,
content: [{ type: "text", text: PLACEHOLDER_TEXT }]
}
});
const content = entry.message.content;
if (typeof content === "string") {
return content.trim()
? { kind: "keep" }
: { kind: "rewrite", entry: placeholder() }; // was: { kind: "drop" }
}
if (!Array.isArray(content)) return { kind: "keep" };
let touched = false;
const nextContent = content.filter((block) => {
if (!block || typeof block !== "object") return true;
if (block.type !== "text") return true;
const text = block.text;
if (typeof text !== "string" || text.trim().length > 0) return true;
touched = true;
return false;
});
if (nextContent.length === 0) {
return { kind: "rewrite", entry: placeholder() }; // was: { kind: "drop" }
}
if (!touched) return { kind: "keep" };
return {
kind: "rewrite",
entry: { ...entry, message: { ...entry.message, content: nextContent } }
};
}
The repair summary log line then changes from dropped N blank user message(s) to rewrote N user message(s), which is also more honest about what the function actually did.
Related issues (none are exact duplicates)
Local workaround applied
Three-layer defense-in-depth patch applied locally and verified end-to-end:
- Source:
repairUserEntryWithBlankTextContent rewrite-don't-drop (same as proposed fix above).
- Transport guard inside
dist/openai-transport-stream-*.js (createOpenAICompletionsTransportStreamFn), placed AFTER the onPayload override — pre-onPayload placement was silently undone by middleware that returned a fresh params object.
- Defensive guard in
pi-ai (@mariozechner/pi-ai/dist/providers/openai-completions.js) for any path that calls pi-ai's exported streamOpenAICompletions directly.
Happy to share the full patch script + STATUS document if useful — patches are content-based and survive content-hashed bundle renames across openclaw upgrades.
Summary
repairUserEntryWithBlankTextContentin OpenClaw's session-file repair returns{ kind: "drop" }for user-role entries whose only text content is blank. The repair function then removes those entries from the on-disk session entirely. When session-memory rehydration produces a session whose post-repair message array is[system, asst, asst, asst, asst](no user role at all), the array passes OpenAI's permissive Chat Completions API but is rejected by stricter providers.With
mlx-vlmserving Qwen3.6 (mlx-community/Qwen3.6-35B-A3B-8bit), the Jinja chat template raisesTemplateError: No user query found in messages.and the server returns HTTP 500. The gateway then either fails over to the next provider (cloud cost + latency) or surfaces the error.OpenClaw
2026.4.27(same code path was present in2026.4.26). The function lives indist/compaction-successor-transcript-*.js.Reproduction
Observed
gateway.err.log:session file repaired: rewrote 4 assistant message(s), dropped 1 blank user message(s)embedded run failover decision: runId=… stage=assistant decision=surface_error reason=timeout from=mbpro/mlx-community/Qwen3.6-35B-A3B-8bit profile=- rawError=500 status code (no body)Expected
After the repair runs, the on-disk session should always contain at least one user-role message. Dropping the only user entry violates the hard requirements of multiple provider chat templates (Qwen3.6 confirmed; expected to surface on other strict templates over time).
Suggested fix
Smallest possible change: in
repairUserEntryWithBlankTextContent, replace both{ kind: "drop" }returns with{ kind: "rewrite", entry: <placeholder> }. The existing repair loop already handles{ kind: "rewrite", entry }correctly, so no caller changes are needed.The repair summary log line then changes from
dropped N blank user message(s)torewrote N user message(s), which is also more honest about what the function actually did.Related issues (none are exact duplicates)
Local workaround applied
Three-layer defense-in-depth patch applied locally and verified end-to-end:
repairUserEntryWithBlankTextContentrewrite-don't-drop (same as proposed fix above).dist/openai-transport-stream-*.js(createOpenAICompletionsTransportStreamFn), placed AFTER theonPayloadoverride — pre-onPayloadplacement was silently undone by middleware that returned a fresh params object.pi-ai(@mariozechner/pi-ai/dist/providers/openai-completions.js) for any path that calls pi-ai's exportedstreamOpenAICompletionsdirectly.Happy to share the full patch script + STATUS document if useful — patches are content-based and survive content-hashed bundle renames across openclaw upgrades.