grammar : fix grammar trigger crash when token extends beyond trigger pattern#19503
grammar : fix grammar trigger crash when token extends beyond trigger pattern#19503EliasOenal wants to merge 1 commit intoggml-org:masterfrom
Conversation
|
This resolves #19304 for me. Thank you! |
|
The PR fixes the crash — llama-server no longer terminates when the grammar issue occurs. However, the underlying grammar failure still happens intermittently during tool-calling with Qwen3 Coder Next. The request now fails gracefully with: Observed behavior:
So this appears to be:
Environment:
|
|
@tamascode I was looking into hallucinated invalid tool calls as well, but I didn't want to introduce major changes to the llama.cpp codebase for my first PR. Thus I focused on fixing the crash, which should be a straight improvement. To me it seems to be a deliberate decision to only activate the "lazy grammar"-path after the tool name has been completed. And the issue is that those Qwen3 models tend to occasionally hallucinate invalid tools they call. At least for me, with my fix applied and OpenCode, the failed tool call informs the model, which in turn usually gets it right on the second try. I believe it would be possible to trigger the grammar earlier, to force models to only emit calls to valid tools, but that doesn't seem to be the design goal of "lazy grammar". I am happy to further look into this, if I could get some guidance on would be the best fit for the project. @ngxson I know #18675 may also address the current llama-server crash, but it seems like a larger undertaking with a potentially longer timeline. Given that Qwen3 Coder Next is a vastly popular model, and many people are facing crashes: do you think it makes sense to merge this targeted fix to make the model work, or would you prefer to wait for the autoparser to get merged instead? |
|
The pattern should trigger before a tool name is generated, to ensure the grammar constrains model output to valid tool calls. The fix here is too invasive. I rather roll out a separate custom Qwen 3 Coder Next parser, with the proper trigger rules, until the autoparser PR is merged. Could also be as simple as changing the pattern to look for |
|
Triggering on |
This fixes a crash when the model emits a tool call that a) completes the grammar trigger pattern, but also b) contains invalid extra text in the same token. An example would be: we're at
<functionin the buffer and match the prefix of the trigger pattern<function=. If then the next token comes in as=list, the=matches and completes the trigger, while thelistpart turns it into an invalid tool call, within the same token. (For the example we assume this tool is not available) I believe some models have per-tool-name triggers for their tool calls and are not susceptible to the bug, but the Qwen3 models and a few others are.This PR fixes the main issue in
src/llama-grammar.cpp, by gracefully trying and failing calls to hallucinated tools. The resulting output will most likely be an invalid tool call to the client, but that is what the model generates.As a complementary improvement it adds basic exception handling to
tools/server/server-context.cpp. This was missing and in turn resulted in llama-server crashing.Resolves #19353 and resolves #19304.
Minimal reproducer:
repro_min.py