Name and Version
version: 1 (6c770d1)
built with GNU 13.3.0 for Linux aarch64
Built from latest master commit 6c770d1 ("Reduce level of content parser warning message #20347") on 2026-03-10.
Operating systems
Linux
GGML backends
CUDA
Hardware
NVIDIA DGX Spark GB10 (ARM64, 128GB unified memory)
Models
- Qwen3-14B-Q8_0.gguf
- Qwen3-30B-A3B-Q8_0.gguf
Both models produce the same corruption. Issue is not model-specific.
Problem description & steps to reproduce
Summary
When llama-server receives a request with a large system prompt and many tool definitions, the arguments field in tool call responses contains invalid JSON with mixed single and double quotes. The same request with a small/simple payload returns valid JSON.
Details
The arguments field in the response contains:
{"name": "Front Door Lock', 'domain': 'lock"}
Instead of the expected valid JSON:
{"name": "Front Door Lock", "domain": "lock"}
The raw HTTP response bytes confirm the corruption originates from llama-server — no SDK or middleware is involved:
b'arguments":"{\\"name\\": \\"Front Door Lock\', \'domain\': \'lock\\"}'
The \\" (escaped double quote) is correct for the first field, but after the first value, the quotes switch to \' (escaped single quote). This produces invalid JSON that cannot be parsed by standard JSON parsers.
Key finding
The corruption only occurs with large/complex payloads. A minimal request with the same model, same tools schema, and same user message returns perfectly valid JSON. The trigger appears to be the combination of a long system prompt (~5000+ chars) with multiple tool definitions (~20+ tools).
llama-server flags
--jinja
--chat-template-kwargs '{"enable_thinking":false,"reasoning_effort":"low"}'
--reasoning-format deepseek
-fa on
--temp 0.0
Steps to reproduce
-
Start llama-server with the flags above and a Qwen3 model.
-
Small payload — works correctly:
curl -s http://localhost:8081/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3",
"temperature": 0.0,
"stream": false,
"messages": [
{"role": "system", "content": "You are a smart home assistant."},
{"role": "user", "content": "Lock the front door."}
],
"tools": [{
"type": "function",
"function": {
"name": "HassTurnOn",
"description": "Turns on a device.",
"parameters": {
"type": "object",
"properties": {
"name": {"type": "string"},
"domain": {"type": "string"}
},
"required": ["name", "domain"]
}
}
}]
}'
Response arguments: {"domain": "lock", "name": "Front Door Lock"} — valid JSON.
- Large payload — corrupted:
Send the same request but with a large system prompt (~5000+ chars containing device context, scripts, multiple instruction sections) and ~20+ tool definitions. The attached diag_payload.json file is the exact payload that triggers the bug.
curl -s http://localhost:8081/v1/chat/completions \
-H "Content-Type: application/json" \
-d @diag_payload.json | python3 -c "
import sys, json
data = json.load(sys.stdin)
for choice in data.get('choices', []):
msg = choice.get('message', {})
for tc in msg.get('tool_calls', []):
print('arguments:', repr(tc['function']['arguments']))
"
Response arguments: '{"name": "Front Door Lock\', \'domain\': \'lock"}' — invalid JSON with mixed quotes.
-
This reproduces with both stream: true and stream: false.
-
This reproduces across multiple models (Qwen3-14B, Qwen3-30B-A3B).
Impact
This breaks any OpenAI-compatible client that calls json.loads() on tool call arguments, which is the standard approach. This is particularly impactful for Home Assistant integrations that use llama-server for local smart home voice control, where every device command requires a tool call with valid JSON arguments.
First Bad Commit
No response
Relevant log output
Raw HTTP response bytes from llama-server (captured via curl, no SDK involved):
b'arguments":"{\\"name\\": \\"Front Door Lock\', \'domain\': \'lock\\"}'
Expected:
b'arguments":"{\\"name\\": \\"Front Door Lock\\", \\"domain\\": \\"lock\\"}'
diag_payload.json
Name and Version
Built from latest master commit
6c770d1("Reduce level of content parser warning message #20347") on 2026-03-10.Operating systems
Linux
GGML backends
CUDA
Hardware
NVIDIA DGX Spark GB10 (ARM64, 128GB unified memory)
Models
Both models produce the same corruption. Issue is not model-specific.
Problem description & steps to reproduce
Summary
When llama-server receives a request with a large system prompt and many tool definitions, the
argumentsfield in tool call responses contains invalid JSON with mixed single and double quotes. The same request with a small/simple payload returns valid JSON.Details
The
argumentsfield in the response contains:{"name": "Front Door Lock', 'domain': 'lock"}Instead of the expected valid JSON:
{"name": "Front Door Lock", "domain": "lock"}The raw HTTP response bytes confirm the corruption originates from llama-server — no SDK or middleware is involved:
The
\\"(escaped double quote) is correct for the first field, but after the first value, the quotes switch to\'(escaped single quote). This produces invalid JSON that cannot be parsed by standard JSON parsers.Key finding
The corruption only occurs with large/complex payloads. A minimal request with the same model, same tools schema, and same user message returns perfectly valid JSON. The trigger appears to be the combination of a long system prompt (~5000+ chars) with multiple tool definitions (~20+ tools).
llama-server flags
Steps to reproduce
Start llama-server with the flags above and a Qwen3 model.
Small payload — works correctly:
Response arguments:
{"domain": "lock", "name": "Front Door Lock"}— valid JSON.Send the same request but with a large system prompt (~5000+ chars containing device context, scripts, multiple instruction sections) and ~20+ tool definitions. The attached
diag_payload.jsonfile is the exact payload that triggers the bug.Response arguments:
'{"name": "Front Door Lock\', \'domain\': \'lock"}'— invalid JSON with mixed quotes.This reproduces with both
stream: trueandstream: false.This reproduces across multiple models (Qwen3-14B, Qwen3-30B-A3B).
Impact
This breaks any OpenAI-compatible client that calls
json.loads()on tool call arguments, which is the standard approach. This is particularly impactful for Home Assistant integrations that use llama-server for local smart home voice control, where every device command requires a tool call with valid JSON arguments.First Bad Commit
No response
Relevant log output
Raw HTTP response bytes from llama-server (captured via curl, no SDK involved):
Expected:
diag_payload.json