Skip to content

Misc. bug: Grammar Syntax Lacks Proper Range Validation and Incorrectly Uses int for Large Numeric Fields #17352

@ylwango613

Description

@ylwango613

Name and Version

./llama-cli --version
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (AMD EPYC 9654 96-Core Processor)
load_backend: failed to find ggml_backend_init in /data/ylwang/Projects/llama.cpp/build/bin/libggml-cpu.so
version: 7090 (0de8878)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Command line

llama-server -m 0.gguf --host 0.0.0.0

Problem description & steps to reproduce

Bug 1: Unbounded Repetition Range Causes DoS

PoC

import requests

resp = requests.post(
    "http://localhost:8080/v1/chat/completions",
    json={
        "messages": [
            {"role": "user", "content": "\n" * 1}
        ],
        "max_tokens": 20,
        "grammar": "root ::= \"a\"{2000000000,2147483647}"
    }
)

print(resp.text)

Impact

This grammar triggers a repetition range of {2000000000, 2147483647}.
The grammar parser expands repetitions by iterating from the minimum to maximum count, resulting in over two billion iterations.

This leads to:

  • 100% CPU utilization,
  • unbounded memory growth due to repeated vector expansion,
  • and a full Denial-of-Service (DoS) of the LLM server.

In realistic usage, repetition counts should never approach these magnitudes. Allowing users to specify arbitrary repetition bounds is unsafe and unnecessary.

Fix Recommendation

In ./llama.cpp/src/llama-grammar.cpp (line ~481):

int min_times = std::stoul(std::string(pos, int_end - pos));

A validation step should be added after parsing numeric values. For example, enforce a reasonable upper bound such as:

  • min_times <= 100000
  • max_times <= 100000

This prevents pathological repetition ranges from causing DoS.


Bug 2: Signed Integer Overflow Due to Incorrect Type Usage

PoC

import requests

resp = requests.post(
    "http://localhost:8080/v1/chat/completions",
    json={
        "messages": [
            {"role": "user", "content": "\n" * 1}
        ],
        "max_tokens": 20,
        "grammar": "root ::= \"a\"{2147483648,2}"
    }
)

print(resp.text)

Server Output

/data/ylwang/Projects/llama.cpp/src/llama-grammar.cpp:382:36: runtime error: signed integer overflow: 2 - -2147483648 cannot be represented in type 'int'

Root Cause

At line ~481 in llama-grammar.cpp, the parser stores the result of std::stoul()—which returns an unsigned long—into an int:

int min_times = std::stoul(std::string(pos, int_end - pos));

When the user supplies:

{2147483648, 2}

std::stoul() correctly returns 2147483648, but storing this into a 32-bit signed int causes wrap-around to:

min_times = -2147483648

Later, at line ~382:

auto n_opt = max_times < 0 ? 1 : max_times - min_times;

the expression 2 - (-2147483648) causes a signed integer overflow, triggering UBSan warnings and entering undefined behavior.

Fix Recommendation

Same as Bug 1: numerical fields must be range-checked.
Alternatively, store min_times/max_times in a type that can safely hold the parsed values (e.g., uint64_t) and verify that they fit within an acceptable application-level bound.

First Bad Commit

No response

Relevant log output

Sub-issues

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinghigh priorityVery important issuemedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)server

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions