Skip to content

Session-file repair drops blank user-role messages, breaking strict OpenAI-compat providers (Qwen3.6 / mlx-vlm) #75313

@jasonvassallo

Description

@jasonvassallo

Summary

repairUserEntryWithBlankTextContent in OpenClaw's session-file repair returns { kind: "drop" } for user-role entries whose only text content is blank. The repair function then removes those entries from the on-disk session entirely. When session-memory rehydration produces a session whose post-repair message array is [system, asst, asst, asst, asst] (no user role at all), the array passes OpenAI's permissive Chat Completions API but is rejected by stricter providers.

With mlx-vlm serving Qwen3.6 (mlx-community/Qwen3.6-35B-A3B-8bit), the Jinja chat template raises TemplateError: No user query found in messages. and the server returns HTTP 500. The gateway then either fails over to the next provider (cloud cost + latency) or surfaces the error.

OpenClaw 2026.4.27 (same code path was present in 2026.4.26). The function lives in dist/compaction-successor-transcript-*.js.

Reproduction

# 1) seed a JSONL session matching the bug pattern: blank user + N rehydrated assistants
SESS=~/.openclaw/agents/main/sessions/repro-$(date +%s).jsonl
cat > "$SESS" <<'EOF'
{"type":"session","sessionId":"repro","createdAt":"2026-04-30T00:00:00Z"}
{"type":"model_change","model":"mlx-community/Qwen3.6-35B-A3B-8bit"}
{"type":"thinking_level_change","level":"off"}
{"type":"custom","payload":{}}
{"type":"message","message":{"role":"user","content":[{"type":"text","text":""}]}}
{"type":"message","message":{"role":"assistant","content":[{"type":"text","text":"rehydrated turn 1"}]}}
{"type":"message","message":{"role":"assistant","content":[{"type":"text","text":"rehydrated turn 2"}]}}
{"type":"message","message":{"role":"assistant","content":[{"type":"text","text":"rehydrated turn 3"}]}}
{"type":"message","message":{"role":"assistant","content":[{"type":"text","text":"rehydrated turn 4"}]}}
EOF

# 2) send a turn forcing the strict provider
openclaw agent --json --thinking off \
  --model "mbpro/mlx-community/Qwen3.6-35B-A3B-8bit" \
  --message "Reply with one word: ok" \
  --session-id "$(basename $SESS .jsonl)"

Observed

  • gateway.err.log: session file repaired: rewrote 4 assistant message(s), dropped 1 blank user message(s)
  • ~10 s later: embedded run failover decision: runId=… stage=assistant decision=surface_error reason=timeout from=mbpro/mlx-community/Qwen3.6-35B-A3B-8bit profile=- rawError=500 status code (no body)
  • Direct curl confirms the wire-level rejection:
    $ curl -s http://<mlx-server>/v1/chat/completions \
        -H 'content-type: application/json' \
        -d '{"model":"mlx-community/Qwen3.6-35B-A3B-8bit","messages":[
              {"role":"system","content":"helper"},
              {"role":"assistant","content":"prior A"},
              {"role":"assistant","content":"prior B"}],"max_tokens":4}'
    {"detail":"An unexpected error occurred: No user query found in messages."}
    HTTP 500
    The same array with one user message appended → HTTP 200.

Expected

After the repair runs, the on-disk session should always contain at least one user-role message. Dropping the only user entry violates the hard requirements of multiple provider chat templates (Qwen3.6 confirmed; expected to surface on other strict templates over time).

Suggested fix

Smallest possible change: in repairUserEntryWithBlankTextContent, replace both { kind: "drop" } returns with { kind: "rewrite", entry: <placeholder> }. The existing repair loop already handles { kind: "rewrite", entry } correctly, so no caller changes are needed.

function repairUserEntryWithBlankTextContent(entry) {
    const PLACEHOLDER_TEXT = "[Continuing previous conversation. Please proceed.]";
    const placeholder = () => ({
        ...entry,
        message: {
            ...entry.message,
            content: [{ type: "text", text: PLACEHOLDER_TEXT }]
        }
    });
    const content = entry.message.content;
    if (typeof content === "string") {
        return content.trim()
            ? { kind: "keep" }
            : { kind: "rewrite", entry: placeholder() };  // was: { kind: "drop" }
    }
    if (!Array.isArray(content)) return { kind: "keep" };
    let touched = false;
    const nextContent = content.filter((block) => {
        if (!block || typeof block !== "object") return true;
        if (block.type !== "text") return true;
        const text = block.text;
        if (typeof text !== "string" || text.trim().length > 0) return true;
        touched = true;
        return false;
    });
    if (nextContent.length === 0) {
        return { kind: "rewrite", entry: placeholder() };  // was: { kind: "drop" }
    }
    if (!touched) return { kind: "keep" };
    return {
        kind: "rewrite",
        entry: { ...entry, message: { ...entry.message, content: nextContent } }
    };
}

The repair summary log line then changes from dropped N blank user message(s) to rewrote N user message(s), which is also more honest about what the function actually did.

Related issues (none are exact duplicates)

Local workaround applied

Three-layer defense-in-depth patch applied locally and verified end-to-end:

  1. Source: repairUserEntryWithBlankTextContent rewrite-don't-drop (same as proposed fix above).
  2. Transport guard inside dist/openai-transport-stream-*.js (createOpenAICompletionsTransportStreamFn), placed AFTER the onPayload override — pre-onPayload placement was silently undone by middleware that returned a fresh params object.
  3. Defensive guard in pi-ai (@mariozechner/pi-ai/dist/providers/openai-completions.js) for any path that calls pi-ai's exported streamOpenAICompletions directly.

Happy to share the full patch script + STATUS document if useful — patches are content-based and survive content-hashed bundle renames across openclaw upgrades.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions