grammar : fix grammar trigger crash when token extends beyond trigger pattern by EliasOenal · Pull Request #19503 · ggml-org/llama.cpp

EliasOenal · 2026-02-11T06:07:48Z

This fixes a crash when the model emits a tool call that a) completes the grammar trigger pattern, but also b) contains invalid extra text in the same token. An example would be: we're at <function in the buffer and match the prefix of the trigger pattern <function=. If then the next token comes in as =list, the = matches and completes the trigger, while the list part turns it into an invalid tool call, within the same token. (For the example we assume this tool is not available) I believe some models have per-tool-name triggers for their tool calls and are not susceptible to the bug, but the Qwen3 models and a few others are.

This PR fixes the main issue in src/llama-grammar.cpp, by gracefully trying and failing calls to hallucinated tools. The resulting output will most likely be an invalid tool call to the client, but that is what the model generates.
As a complementary improvement it adds basic exception handling to tools/server/server-context.cpp. This was missing and in turn resulted in llama-server crashing.

Resolves #19353 and resolves #19304.

Minimal reproducer:

#!/usr/bin/env python3
"""Reproducer: grammar crash when a token completes a trigger AND contains
text beyond it (e.g. token "=list" completing trigger "<function=").
Requires a running llama-server with a Qwen3 model (e.g. Qwen3-4B)."""
import requests, sys, time

URL = "http://127.0.0.1:8080"
PAYLOAD = {
    "prompt": """<|im_start|>system
You must use the tool to answer.
<tools><function><name>list</name><description>List files</description>
<parameters><parameter><name>path</name><type>string</type></parameter>
<required>["path"]</required></parameters></function></tools>
Reply format: <tool_call>\n<function=name>\n<parameter=p>v</parameter>\n</function>\n</tool_call><|im_end|>
<|im_start|>user
List /tmp<|im_end|>
<|im_start|>assistant
""",
    # grammar only allows specific tool names — "list" is intentionally missing
    "grammar": 'root ::= "<tool_call>\\n<function=" ("bash"|"search") ">" [^<]* "</function>\\n</tool_call>"',
    "grammar_lazy": True,
    "grammar_triggers": [{"type": 2, "value": r"<tool_call>\n<function="}],
    "n_predict": 256, "temperature": 0.7,
}

for i in range(1, 6):
    try:
        r = requests.post(f"{URL}/completion", json=PAYLOAD, timeout=60)
        print(f"{i}: {r.json().get('content', '')[:100]}")
    except requests.exceptions.ConnectionError:
        print(f"{i}: SERVER CRASHED"); sys.exit(1)
print("No crash after 5 attempts.")

repro_min.py

… pattern fixes ggml-org#19304 fixes ggml-org#19353

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

dstolpmann · 2026-02-11T21:15:33Z

This resolves #19304 for me. Thank you!

tamascode · 2026-02-14T06:43:44Z

@EliasOenal

The PR fixes the crash — llama-server no longer terminates when the grammar issue occurs.

However, the underlying grammar failure still happens intermittently during tool-calling with Qwen3 Coder Next. The request now fails gracefully with:

Grammar error: Unexpected empty grammar stack after accepting piece: =read

Observed behavior:

prompt processes normally
generation starts
model emits a tool-call piece (=read)
grammar stack becomes empty
request returns an error
server stays alive and releases the slot correctly

So this appears to be:

crash fixed
grammar/tool-call mismatch still present

Environment:

llama-server (/v1/chat/completions)
Qwen3 Coder chat format
tool calling / grammar-constrained decoding enabled
large prompt (~43k tokens)

EliasOenal · 2026-02-14T20:12:00Z

@tamascode I was looking into hallucinated invalid tool calls as well, but I didn't want to introduce major changes to the llama.cpp codebase for my first PR. Thus I focused on fixing the crash, which should be a straight improvement.

To me it seems to be a deliberate decision to only activate the "lazy grammar"-path after the tool name has been completed. And the issue is that those Qwen3 models tend to occasionally hallucinate invalid tools they call. At least for me, with my fix applied and OpenCode, the failed tool call informs the model, which in turn usually gets it right on the second try.

I believe it would be possible to trigger the grammar earlier, to force models to only emit calls to valid tools, but that doesn't seem to be the design goal of "lazy grammar". I am happy to further look into this, if I could get some guidance on would be the best fit for the project.

@ngxson I know #18675 may also address the current llama-server crash, but it seems like a larger undertaking with a potentially longer timeline. Given that Qwen3 Coder Next is a vastly popular model, and many people are facing crashes: do you think it makes sense to merge this targeted fix to make the model work, or would you prefer to wait for the autoparser to get merged instead?

aldehir · 2026-02-15T03:22:33Z

The pattern should trigger before a tool name is generated, to ensure the grammar constrains model output to valid tool calls. The fix here is too invasive.

I rather roll out a separate custom Qwen 3 Coder Next parser, with the proper trigger rules, until the autoparser PR is merged. Could also be as simple as changing the pattern to look for <function instead of <function=, if = is not part of the token.

EliasOenal · 2026-02-15T04:22:13Z

Triggering on <function would only reliably fix this, if function never tokenizes with any appendages. And I'm not convinced this is guaranteed. The current crash is a denial of service issue. Any user can take the server down by manipulating the model to emit the right token sequence (see reproducer). And it seemed to me like there were additional code paths that may throw as well. llama-server just wasn't handling the exceptions at all.

grammar : fix grammar trigger crash when token extends beyond trigger…

3637bf5

… pattern fixes ggml-org#19304 fixes ggml-org#19353

Copilot AI review requested due to automatic review settings February 11, 2026 06:07

EliasOenal requested review from ggerganov and ngxson as code owners February 11, 2026 06:07

github-actions bot added examples server labels Feb 11, 2026

Copilot AI reviewed Feb 11, 2026

View reviewed changes

EliasOenal requested a review from Copilot February 11, 2026 07:20

EliasOenal mentioned this pull request Feb 11, 2026

Eval bug: Lllama.cpp crashes when running Qwen Next 80B Coder #19304

Closed

Copilot AI reviewed Feb 11, 2026

View reviewed changes

Copilot started reviewing on behalf of EliasOenal February 11, 2026 13:46 View session

Copilot started reviewing on behalf of EliasOenal February 11, 2026 14:20 View session

loci-dev mentioned this pull request Feb 15, 2026

UPSTREAM PR #19503: grammar : fix grammar trigger crash when token extends beyond trigger pattern auroralabs-loci/llama.cpp#1176

Open

aldehir mentioned this pull request Feb 20, 2026

common : merge qwen3-coder and nemotron nano 3 parsers #19765

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grammar : fix grammar trigger crash when token extends beyond trigger pattern#19503

grammar : fix grammar trigger crash when token extends beyond trigger pattern#19503
EliasOenal wants to merge 1 commit intoggml-org:masterfrom
EliasOenal:fix-grammar-trigger-crash

EliasOenal commented Feb 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

dstolpmann commented Feb 11, 2026

Uh oh!

tamascode commented Feb 14, 2026

Uh oh!

EliasOenal commented Feb 14, 2026

Uh oh!

aldehir commented Feb 15, 2026 •

edited

Loading

Uh oh!

EliasOenal commented Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

EliasOenal commented Feb 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

dstolpmann commented Feb 11, 2026

Uh oh!

tamascode commented Feb 14, 2026

Uh oh!

EliasOenal commented Feb 14, 2026

Uh oh!

aldehir commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EliasOenal commented Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

aldehir commented Feb 15, 2026 •

edited

Loading