Session-file repair drops blank user-role messages, breaking strict OpenAI-compat providers (Qwen3.6 / mlx-vlm)

## Summary

`repairUserEntryWithBlankTextContent` in OpenClaw's session-file repair returns `{ kind: "drop" }` for user-role entries whose only text content is blank. The repair function then removes those entries from the on-disk session entirely. When session-memory rehydration produces a session whose post-repair message array is `[system, asst, asst, asst, asst]` (no user role at all), the array passes OpenAI's permissive Chat Completions API but is rejected by stricter providers.

With `mlx-vlm` serving Qwen3.6 (`mlx-community/Qwen3.6-35B-A3B-8bit`), the Jinja chat template raises `TemplateError: No user query found in messages.` and the server returns HTTP 500. The gateway then either fails over to the next provider (cloud cost + latency) or surfaces the error.

OpenClaw `2026.4.27` (same code path was present in `2026.4.26`). The function lives in `dist/compaction-successor-transcript-*.js`.

## Reproduction

```bash
# 1) seed a JSONL session matching the bug pattern: blank user + N rehydrated assistants
SESS=~/.openclaw/agents/main/sessions/repro-$(date +%s).jsonl
cat > "$SESS" <<'EOF'
{"type":"session","sessionId":"repro","createdAt":"2026-04-30T00:00:00Z"}
{"type":"model_change","model":"mlx-community/Qwen3.6-35B-A3B-8bit"}
{"type":"thinking_level_change","level":"off"}
{"type":"custom","payload":{}}
{"type":"message","message":{"role":"user","content":[{"type":"text","text":""}]}}
{"type":"message","message":{"role":"assistant","content":[{"type":"text","text":"rehydrated turn 1"}]}}
{"type":"message","message":{"role":"assistant","content":[{"type":"text","text":"rehydrated turn 2"}]}}
{"type":"message","message":{"role":"assistant","content":[{"type":"text","text":"rehydrated turn 3"}]}}
{"type":"message","message":{"role":"assistant","content":[{"type":"text","text":"rehydrated turn 4"}]}}
EOF

# 2) send a turn forcing the strict provider
openclaw agent --json --thinking off \
  --model "mbpro/mlx-community/Qwen3.6-35B-A3B-8bit" \
  --message "Reply with one word: ok" \
  --session-id "$(basename $SESS .jsonl)"
```

## Observed

- `gateway.err.log`: `session file repaired: rewrote 4 assistant message(s), dropped 1 blank user message(s)`
- ~10 s later: `embedded run failover decision: runId=… stage=assistant decision=surface_error reason=timeout from=mbpro/mlx-community/Qwen3.6-35B-A3B-8bit profile=- rawError=500 status code (no body)`
- Direct curl confirms the wire-level rejection:
  ```bash
  $ curl -s http://<mlx-server>/v1/chat/completions \
      -H 'content-type: application/json' \
      -d '{"model":"mlx-community/Qwen3.6-35B-A3B-8bit","messages":[
            {"role":"system","content":"helper"},
            {"role":"assistant","content":"prior A"},
            {"role":"assistant","content":"prior B"}],"max_tokens":4}'
  {"detail":"An unexpected error occurred: No user query found in messages."}
  HTTP 500
  ```
  The same array with one user message appended → HTTP 200.

## Expected

After the repair runs, the on-disk session should always contain at least one user-role message. Dropping the only user entry violates the hard requirements of multiple provider chat templates (Qwen3.6 confirmed; expected to surface on other strict templates over time).

## Suggested fix

Smallest possible change: in `repairUserEntryWithBlankTextContent`, replace both `{ kind: "drop" }` returns with `{ kind: "rewrite", entry: <placeholder> }`. The existing repair loop already handles `{ kind: "rewrite", entry }` correctly, so no caller changes are needed.

```js
function repairUserEntryWithBlankTextContent(entry) {
    const PLACEHOLDER_TEXT = "[Continuing previous conversation. Please proceed.]";
    const placeholder = () => ({
        ...entry,
        message: {
            ...entry.message,
            content: [{ type: "text", text: PLACEHOLDER_TEXT }]
        }
    });
    const content = entry.message.content;
    if (typeof content === "string") {
        return content.trim()
            ? { kind: "keep" }
            : { kind: "rewrite", entry: placeholder() };  // was: { kind: "drop" }
    }
    if (!Array.isArray(content)) return { kind: "keep" };
    let touched = false;
    const nextContent = content.filter((block) => {
        if (!block || typeof block !== "object") return true;
        if (block.type !== "text") return true;
        const text = block.text;
        if (typeof text !== "string" || text.trim().length > 0) return true;
        touched = true;
        return false;
    });
    if (nextContent.length === 0) {
        return { kind: "rewrite", entry: placeholder() };  // was: { kind: "drop" }
    }
    if (!touched) return { kind: "keep" };
    return {
        kind: "rewrite",
        entry: { ...entry, message: { ...entry.message, content: nextContent } }
    };
}
```

The repair summary log line then changes from `dropped N blank user message(s)` to `rewrote N user message(s)`, which is also more honest about what the function actually did.

## Related issues (none are exact duplicates)

- #73472 (closed) — same file, opposite-direction concern (sanitizing empty text blocks for Anthropic). This is the sibling bug: the function DOES handle them, just destructively.
- #75305 (open) — pre-compaction memory flush sends empty user message to Anthropic; same family of "session-memory produces broken user shape," different code path.
- #75235 (open) — leading-assistant transcript causes infinite loop; adjacent (the corruption that leaves no leading user message).
- #32936 / #68868 (closed) — Qwen "No user query found" misclassified as context overflow; same upstream provider error string, but those issues were about diagnosis/classification, not the repair-drop root cause.

## Local workaround applied

Three-layer defense-in-depth patch applied locally and verified end-to-end:
1. Source: `repairUserEntryWithBlankTextContent` rewrite-don't-drop (same as proposed fix above).
2. Transport guard inside `dist/openai-transport-stream-*.js` (`createOpenAICompletionsTransportStreamFn`), placed AFTER the `onPayload` override — pre-`onPayload` placement was silently undone by middleware that returned a fresh params object.
3. Defensive guard in `pi-ai` (`@mariozechner/pi-ai/dist/providers/openai-completions.js`) for any path that calls pi-ai's exported `streamOpenAICompletions` directly.

Happy to share the full patch script + STATUS document if useful — patches are content-based and survive content-hashed bundle renames across openclaw upgrades.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Session-file repair drops blank user-role messages, breaking strict OpenAI-compat providers (Qwen3.6 / mlx-vlm) #75313

Summary

Reproduction

Observed

Expected

Suggested fix

Related issues (none are exact duplicates)

Local workaround applied

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Session-file repair drops blank user-role messages, breaking strict OpenAI-compat providers (Qwen3.6 / mlx-vlm) #75313

Description

Summary

Reproduction

Observed

Expected

Suggested fix

Related issues (none are exact duplicates)

Local workaround applied

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions