Skip to content

Streamed and non-streamed tool call arguments contain invalid JSON (mixed single/double quotes) with large payloads #20359

@jxlarrea

Description

@jxlarrea

Name and Version

version: 1 (6c770d1)
built with GNU 13.3.0 for Linux aarch64

Built from latest master commit 6c770d1 ("Reduce level of content parser warning message #20347") on 2026-03-10.

Operating systems

Linux

GGML backends

CUDA

Hardware

NVIDIA DGX Spark GB10 (ARM64, 128GB unified memory)

Models

  • Qwen3-14B-Q8_0.gguf
  • Qwen3-30B-A3B-Q8_0.gguf

Both models produce the same corruption. Issue is not model-specific.

Problem description & steps to reproduce

Summary

When llama-server receives a request with a large system prompt and many tool definitions, the arguments field in tool call responses contains invalid JSON with mixed single and double quotes. The same request with a small/simple payload returns valid JSON.

Details

The arguments field in the response contains:

{"name": "Front Door Lock', 'domain': 'lock"}

Instead of the expected valid JSON:

{"name": "Front Door Lock", "domain": "lock"}

The raw HTTP response bytes confirm the corruption originates from llama-server — no SDK or middleware is involved:

b'arguments":"{\\"name\\": \\"Front Door Lock\', \'domain\': \'lock\\"}'

The \\" (escaped double quote) is correct for the first field, but after the first value, the quotes switch to \' (escaped single quote). This produces invalid JSON that cannot be parsed by standard JSON parsers.

Key finding

The corruption only occurs with large/complex payloads. A minimal request with the same model, same tools schema, and same user message returns perfectly valid JSON. The trigger appears to be the combination of a long system prompt (~5000+ chars) with multiple tool definitions (~20+ tools).

llama-server flags

--jinja
--chat-template-kwargs '{"enable_thinking":false,"reasoning_effort":"low"}'
--reasoning-format deepseek
-fa on
--temp 0.0

Steps to reproduce

  1. Start llama-server with the flags above and a Qwen3 model.

  2. Small payload — works correctly:

curl -s http://localhost:8081/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
  "model": "qwen3",
  "temperature": 0.0,
  "stream": false,
  "messages": [
    {"role": "system", "content": "You are a smart home assistant."},
    {"role": "user", "content": "Lock the front door."}
  ],
  "tools": [{
    "type": "function",
    "function": {
      "name": "HassTurnOn",
      "description": "Turns on a device.",
      "parameters": {
        "type": "object",
        "properties": {
          "name": {"type": "string"},
          "domain": {"type": "string"}
        },
        "required": ["name", "domain"]
      }
    }
  }]
}'

Response arguments: {"domain": "lock", "name": "Front Door Lock"}valid JSON.

  1. Large payload — corrupted:

Send the same request but with a large system prompt (~5000+ chars containing device context, scripts, multiple instruction sections) and ~20+ tool definitions. The attached diag_payload.json file is the exact payload that triggers the bug.

curl -s http://localhost:8081/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d @diag_payload.json | python3 -c "
import sys, json
data = json.load(sys.stdin)
for choice in data.get('choices', []):
    msg = choice.get('message', {})
    for tc in msg.get('tool_calls', []):
        print('arguments:', repr(tc['function']['arguments']))
"

Response arguments: '{"name": "Front Door Lock\', \'domain\': \'lock"}'invalid JSON with mixed quotes.

  1. This reproduces with both stream: true and stream: false.

  2. This reproduces across multiple models (Qwen3-14B, Qwen3-30B-A3B).

Impact

This breaks any OpenAI-compatible client that calls json.loads() on tool call arguments, which is the standard approach. This is particularly impactful for Home Assistant integrations that use llama-server for local smart home voice control, where every device command requires a tool call with valid JSON arguments.

First Bad Commit

No response

Relevant log output

Raw HTTP response bytes from llama-server (captured via curl, no SDK involved):

b'arguments":"{\\"name\\": \\"Front Door Lock\', \'domain\': \'lock\\"}'

Expected:

b'arguments":"{\\"name\\": \\"Front Door Lock\\", \\"domain\\": \\"lock\\"}'

diag_payload.json

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions