`MAX_REPETITION_THRESHOLD` (2000) breaks tool-calling grammars for tools with many optional parameters

### What happened

When an OpenAI-compatible `/v1/chat/completions` request includes tools with many optional parameters, the GBNF grammar fails to compile and the server returns a 500 error. The grammar compiler generates `(param_1 | ... | param_N){0,N}` rules for optional parameters. With N=48 or N=94, the rule expansion exceeds `MAX_REPETITION_THRESHOLD` (2000), grammar compilation fails silently, and the PEG parser then fails on the unconstrained model output at the end of streaming:

```
data: {"error":{"code":500,"message":"Failed to parse input at pos 138: <tool_call>\n<function=write>\n..."}}
```

### How I found this

I was running [PinchBench](https://github.com/pinchbench/skill) (an AI agent benchmark suite) against a llama.cpp-served Qwen3.5-27B, using [OpenClaw](https://openclaw.com) as the agent runtime. OpenClaw's standard toolset — the same set every OpenClaw user gets — includes tools like `browser` (48 params) and `message` (94 params). Every benchmark task failed with a 500 error before the agent could act.

This is not a pathological personal setup. PinchBench runs against the agent's default tools; any llama.cpp user running an OpenClaw or similar agent benchmark will hit this.

### Reproduction

Send a streaming chat completion request with tools that have many optional parameters. A single tool with 48 optional parameters is enough to trigger it. The request works on builds before PR #18604.

```bash
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "test",
    "stream": true,
    "messages": [{"role": "user", "content": "Write hello to /tmp/test.txt"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "browser",
        "description": "Browser automation",
        "parameters": {
          "type": "object",
          "properties": {
            "action": {"type": "string"},
            "url": {"type": "string"},
            "selector": {"type": "string"},
            "text": {"type": "string"},
            "timeout": {"type": "number"},
            "visible": {"type": "boolean"},
            "screenshot": {"type": "boolean"},
            "full_page": {"type": "boolean"},
            "wait_for": {"type": "string"},
            "scroll_x": {"type": "number"},
            "scroll_y": {"type": "number"},
            "key": {"type": "string"},
            "modifiers": {"type": "string"},
            "button": {"type": "string"},
            "x": {"type": "number"},
            "y": {"type": "number"},
            "width": {"type": "number"},
            "height": {"type": "number"},
            "format": {"type": "string"},
            "quality": {"type": "number"},
            "clip": {"type": "boolean"},
            "delay": {"type": "number"},
            "navigation_timeout": {"type": "number"},
            "wait_until": {"type": "string"},
            "extra_http_headers": {"type": "string"},
            "ignore_https_errors": {"type": "boolean"},
            "java_script_enabled": {"type": "boolean"},
            "bypass_csp": {"type": "boolean"},
            "user_agent": {"type": "string"},
            "viewport_width": {"type": "number"},
            "viewport_height": {"type": "number"},
            "device_scale_factor": {"type": "number"},
            "is_mobile": {"type": "boolean"},
            "has_touch": {"type": "boolean"},
            "color_scheme": {"type": "string"},
            "reduced_motion": {"type": "string"},
            "forced_colors": {"type": "string"},
            "accept_downloads": {"type": "boolean"},
            "record_video": {"type": "boolean"},
            "record_video_dir": {"type": "string"},
            "record_video_size": {"type": "string"},
            "proxy_server": {"type": "string"},
            "proxy_bypass": {"type": "string"},
            "proxy_username": {"type": "string"},
            "proxy_password": {"type": "string"},
            "storage_state": {"type": "string"},
            "geolocation_latitude": {"type": "number"},
            "geolocation_longitude": {"type": "number"}
          },
          "required": ["action"]
        }
      }
    },
    {
      "type": "function",
      "function": {
        "name": "write",
        "description": "Write file",
        "parameters": {
          "type": "object",
          "properties": {
            "path": {"type": "string"},
            "content": {"type": "string"}
          },
          "required": ["path", "content"]
        }
      }
    }]
  }'
```

**Server log output:**

```
parse: error parsing grammar: number of rules that are going to be repeated
multiplied by the new repetition exceeds sane defaults, please reduce the
number of repetitions or rule complexity
failed to parse grammar
```

Followed by a 500 error in the response when the PEG parser fails on the final (non-partial) parse.

### Failure mode

The failure mode is particularly bad for streaming:

1. Grammar compilation fails silently (logged, but the request proceeds)
2. The model generates unconstrained output (native XML tool-call format)
3. Partial PEG parsing during streaming detects the tool call start and sends a `tool_calls` delta with `arguments: "{"`
4. The final (non-partial) PEG parse fails on the unconstrained output
5. The server returns a 500 error mid-stream

The client receives an inconsistent response: a partial tool call followed by an error.

### Prior report

This was reported in #17473. The assignee @pwilkin said they would add a configuration parameter, but the issue was auto-closed as stale before that happened.

### Proposed fix

Make `MAX_REPETITION_THRESHOLD` configurable via a server parameter (e.g. `--grammar-max-repetitions`), or raise the default. The current value of 2000 is too low for standard tool-calling use cases.

For context, raising the threshold to 100,000 on a private server had no measurable impact:
- Grammar compilation: <2 ms for 24 tools (including one with 94 optional params)
- Per-token sampling: 34.12 ms/token with grammar vs 34.67 ms/token without
- Memory: ~4 MB additional (negligible)

The DoS protection from PR #18604 (stack overflow via nested repetitions, hangs via unbounded expansion) could be preserved with a higher default or a separate, tighter limit on nesting depth rather than total rule count.

### Environment

- **Version**: b8468 (commit 3306dbaef)
- **Model**: Qwen3.5-27B (unsloth/Qwen3.5-27B-GGUF:UD-Q5_K_XL)
- **Chat format**: peg-native
- **Client**: OpenClaw 2026.3.13, running PinchBench agent benchmark


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`MAX_REPETITION_THRESHOLD` (2000) breaks tool-calling grammars for tools with many optional parameters #20867

What happened

How I found this

Reproduction

Failure mode

Prior report

Proposed fix

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

MAX_REPETITION_THRESHOLD (2000) breaks tool-calling grammars for tools with many optional parameters #20867

Description

What happened

How I found this

Reproduction

Failure mode

Prior report

Proposed fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`MAX_REPETITION_THRESHOLD` (2000) breaks tool-calling grammars for tools with many optional parameters #20867