Streamed and non-streamed tool call arguments contain invalid JSON (mixed single/double quotes) with large payloads

### Name and Version

```
version: 1 (6c770d1)
built with GNU 13.3.0 for Linux aarch64
```

Built from latest master commit `6c770d1` ("Reduce level of content parser warning message #20347") on 2026-03-10.

### Operating systems

Linux

### GGML backends

CUDA

### Hardware

NVIDIA DGX Spark GB10 (ARM64, 128GB unified memory)

### Models

- Qwen3-14B-Q8_0.gguf
- Qwen3-30B-A3B-Q8_0.gguf

Both models produce the same corruption. Issue is not model-specific.

### Problem description & steps to reproduce

### Summary

When llama-server receives a request with a large system prompt and many tool definitions, the `arguments` field in tool call responses contains invalid JSON with mixed single and double quotes. The same request with a small/simple payload returns valid JSON.

### Details

The `arguments` field in the response contains:

```json
{"name": "Front Door Lock', 'domain': 'lock"}
```

Instead of the expected valid JSON:

```json
{"name": "Front Door Lock", "domain": "lock"}
```

The raw HTTP response bytes confirm the corruption originates from llama-server — no SDK or middleware is involved:

```
b'arguments":"{\\"name\\": \\"Front Door Lock\', \'domain\': \'lock\\"}'
```

The `\\"` (escaped double quote) is correct for the first field, but after the first value, the quotes switch to `\'` (escaped single quote). This produces invalid JSON that cannot be parsed by standard JSON parsers.

### Key finding

The corruption **only occurs with large/complex payloads**. A minimal request with the same model, same tools schema, and same user message returns perfectly valid JSON. The trigger appears to be the combination of a long system prompt (~5000+ chars) with multiple tool definitions (~20+ tools).

### llama-server flags

```
--jinja
--chat-template-kwargs '{"enable_thinking":false,"reasoning_effort":"low"}'
--reasoning-format deepseek
-fa on
--temp 0.0
```

### Steps to reproduce

1. Start llama-server with the flags above and a Qwen3 model.

2. **Small payload — works correctly:**

```bash
curl -s http://localhost:8081/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
  "model": "qwen3",
  "temperature": 0.0,
  "stream": false,
  "messages": [
    {"role": "system", "content": "You are a smart home assistant."},
    {"role": "user", "content": "Lock the front door."}
  ],
  "tools": [{
    "type": "function",
    "function": {
      "name": "HassTurnOn",
      "description": "Turns on a device.",
      "parameters": {
        "type": "object",
        "properties": {
          "name": {"type": "string"},
          "domain": {"type": "string"}
        },
        "required": ["name", "domain"]
      }
    }
  }]
}'
```

Response arguments: `{"domain": "lock", "name": "Front Door Lock"}` — **valid JSON**.

3. **Large payload — corrupted:**

Send the same request but with a large system prompt (~5000+ chars containing device context, scripts, multiple instruction sections) and ~20+ tool definitions. The attached `diag_payload.json` file is the exact payload that triggers the bug.

```bash
curl -s http://localhost:8081/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d @diag_payload.json | python3 -c "
import sys, json
data = json.load(sys.stdin)
for choice in data.get('choices', []):
    msg = choice.get('message', {})
    for tc in msg.get('tool_calls', []):
        print('arguments:', repr(tc['function']['arguments']))
"
```

Response arguments: `'{"name": "Front Door Lock\', \'domain\': \'lock"}'` — **invalid JSON with mixed quotes**.

4. This reproduces with both `stream: true` and `stream: false`.

5. This reproduces across multiple models (Qwen3-14B, Qwen3-30B-A3B).

### Impact

This breaks any OpenAI-compatible client that calls `json.loads()` on tool call arguments, which is the standard approach. This is particularly impactful for Home Assistant integrations that use llama-server for local smart home voice control, where every device command requires a tool call with valid JSON arguments.


### First Bad Commit

_No response_

### Relevant log output

Raw HTTP response bytes from llama-server (captured via curl, no SDK involved):

```
b'arguments":"{\\"name\\": \\"Front Door Lock\', \'domain\': \'lock\\"}'
```

Expected:

```
b'arguments":"{\\"name\\": \\"Front Door Lock\\", \\"domain\\": \\"lock\\"}'
```

[diag_payload.json](https://github.com/user-attachments/files/25879886/diag_payload.json)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streamed and non-streamed tool call arguments contain invalid JSON (mixed single/double quotes) with large payloads #20359

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

Summary

Details

Key finding

llama-server flags

Steps to reproduce

Impact

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Streamed and non-streamed tool call arguments contain invalid JSON (mixed single/double quotes) with large payloads #20359

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

Summary

Details

Key finding

llama-server flags

Steps to reproduce

Impact

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions