Skip to content

MAX_REPETITION_THRESHOLD (2000) breaks tool-calling grammars for tools with many optional parameters #20867

@Whamp

Description

@Whamp

What happened

When an OpenAI-compatible /v1/chat/completions request includes tools with many optional parameters, the GBNF grammar fails to compile and the server returns a 500 error. The grammar compiler generates (param_1 | ... | param_N){0,N} rules for optional parameters. With N=48 or N=94, the rule expansion exceeds MAX_REPETITION_THRESHOLD (2000), grammar compilation fails silently, and the PEG parser then fails on the unconstrained model output at the end of streaming:

data: {"error":{"code":500,"message":"Failed to parse input at pos 138: <tool_call>\n<function=write>\n..."}}

How I found this

I was running PinchBench (an AI agent benchmark suite) against a llama.cpp-served Qwen3.5-27B, using OpenClaw as the agent runtime. OpenClaw's standard toolset — the same set every OpenClaw user gets — includes tools like browser (48 params) and message (94 params). Every benchmark task failed with a 500 error before the agent could act.

This is not a pathological personal setup. PinchBench runs against the agent's default tools; any llama.cpp user running an OpenClaw or similar agent benchmark will hit this.

Reproduction

Send a streaming chat completion request with tools that have many optional parameters. A single tool with 48 optional parameters is enough to trigger it. The request works on builds before PR #18604.

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "test",
    "stream": true,
    "messages": [{"role": "user", "content": "Write hello to /tmp/test.txt"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "browser",
        "description": "Browser automation",
        "parameters": {
          "type": "object",
          "properties": {
            "action": {"type": "string"},
            "url": {"type": "string"},
            "selector": {"type": "string"},
            "text": {"type": "string"},
            "timeout": {"type": "number"},
            "visible": {"type": "boolean"},
            "screenshot": {"type": "boolean"},
            "full_page": {"type": "boolean"},
            "wait_for": {"type": "string"},
            "scroll_x": {"type": "number"},
            "scroll_y": {"type": "number"},
            "key": {"type": "string"},
            "modifiers": {"type": "string"},
            "button": {"type": "string"},
            "x": {"type": "number"},
            "y": {"type": "number"},
            "width": {"type": "number"},
            "height": {"type": "number"},
            "format": {"type": "string"},
            "quality": {"type": "number"},
            "clip": {"type": "boolean"},
            "delay": {"type": "number"},
            "navigation_timeout": {"type": "number"},
            "wait_until": {"type": "string"},
            "extra_http_headers": {"type": "string"},
            "ignore_https_errors": {"type": "boolean"},
            "java_script_enabled": {"type": "boolean"},
            "bypass_csp": {"type": "boolean"},
            "user_agent": {"type": "string"},
            "viewport_width": {"type": "number"},
            "viewport_height": {"type": "number"},
            "device_scale_factor": {"type": "number"},
            "is_mobile": {"type": "boolean"},
            "has_touch": {"type": "boolean"},
            "color_scheme": {"type": "string"},
            "reduced_motion": {"type": "string"},
            "forced_colors": {"type": "string"},
            "accept_downloads": {"type": "boolean"},
            "record_video": {"type": "boolean"},
            "record_video_dir": {"type": "string"},
            "record_video_size": {"type": "string"},
            "proxy_server": {"type": "string"},
            "proxy_bypass": {"type": "string"},
            "proxy_username": {"type": "string"},
            "proxy_password": {"type": "string"},
            "storage_state": {"type": "string"},
            "geolocation_latitude": {"type": "number"},
            "geolocation_longitude": {"type": "number"}
          },
          "required": ["action"]
        }
      }
    },
    {
      "type": "function",
      "function": {
        "name": "write",
        "description": "Write file",
        "parameters": {
          "type": "object",
          "properties": {
            "path": {"type": "string"},
            "content": {"type": "string"}
          },
          "required": ["path", "content"]
        }
      }
    }]
  }'

Server log output:

parse: error parsing grammar: number of rules that are going to be repeated
multiplied by the new repetition exceeds sane defaults, please reduce the
number of repetitions or rule complexity
failed to parse grammar

Followed by a 500 error in the response when the PEG parser fails on the final (non-partial) parse.

Failure mode

The failure mode is particularly bad for streaming:

  1. Grammar compilation fails silently (logged, but the request proceeds)
  2. The model generates unconstrained output (native XML tool-call format)
  3. Partial PEG parsing during streaming detects the tool call start and sends a tool_calls delta with arguments: "{"
  4. The final (non-partial) PEG parse fails on the unconstrained output
  5. The server returns a 500 error mid-stream

The client receives an inconsistent response: a partial tool call followed by an error.

Prior report

This was reported in #17473. The assignee @pwilkin said they would add a configuration parameter, but the issue was auto-closed as stale before that happened.

Proposed fix

Make MAX_REPETITION_THRESHOLD configurable via a server parameter (e.g. --grammar-max-repetitions), or raise the default. The current value of 2000 is too low for standard tool-calling use cases.

For context, raising the threshold to 100,000 on a private server had no measurable impact:

  • Grammar compilation: <2 ms for 24 tools (including one with 94 optional params)
  • Per-token sampling: 34.12 ms/token with grammar vs 34.67 ms/token without
  • Memory: ~4 MB additional (negligible)

The DoS protection from PR #18604 (stack overflow via nested repetitions, hangs via unbounded expansion) could be preserved with a higher default or a separate, tighter limit on nesting depth rather than total rule count.

Environment

  • Version: b8468 (commit 3306dba)
  • Model: Qwen3.5-27B (unsloth/Qwen3.5-27B-GGUF:UD-Q5_K_XL)
  • Chat format: peg-native
  • Client: OpenClaw 2026.3.13, running PinchBench agent benchmark

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions