Add Mistral Guidance by juliendenize · Pull Request #37081 · vllm-project/vllm

juliendenize · 2026-03-14T23:53:01Z

Purpose

Fix Mistral tool calling and reasoning parsing by introducing Lark grammar-based structured output for Mistral models. This ensures correct tool call parsing for both streaming and non-streaming modes across all tool_choice values (auto, required, named, none) and all Mistral tokenizer versions.

Problems on main:

Streaming tool calls are broken for post-v15 Mistral tokenizers: [TOOL_CALLS] tokens leak into content instead of being parsed as tool calls.
tool_choice="required" and named tool choice produce unparseable arguments (e.g., [TOOL_CALLS]get_current_weather{...} embedded in the arguments string).
tool_choice="none" leaks raw [TOOL_CALLS] special tokens into user-visible content.
Reasoning content ([THINK]...[/THINK]) is not separated from content for pre-v15 models and not properly controlled via reasoning_effort for v15+ models.

This PR fixes by:

Introduce MistralGrammarFactory that generates Lark grammars from Jinja templates, selecting the appropriate grammar variant. This involves adding Lark grammar and support for Mistral Tokenizer to llguidace.
Update the OpenAI chat serving layer to use the Mistral grammar path for both streaming and non-streaming responses.

Test Plan

Unit tests (in this repo)

# Mistral tool parser tests (grammar factory, lark converter, adjust_request, streaming)
pytest tests/tool_parsers/test_mistral_tool_parser.py -v -s

# Mistral reasoning parser tests (is_reasoning_end with prefilled prompts)
pytest tests/reasoning/test_mistral_reasoning_parser.py -v -s

# Guidance backend lark grammar test
pytest tests/v1/structured_output/test_backend_guidance.py -v -s -k test_backend_guidance_lark_grammar

End-to-end tests (external repo)

Scripts and results are available at: https://github.com/juliendenize/vllm-test-tool-and-reasoning-parsing

Post-v15 tokenizer: 48 tests (8 scenarios × 2 stream modes × 3 reasoning_effort values)

python test_vllm_tools_post_v15.py --base-url http://localhost:8000/v1

Pre-v15 tokenizer: 16 tests (8 scenarios × 2 stream modes)

python test_vllm_tools_pre_v15.py --base-url http://localhost:8000/v1

Test Result

End-to-end results comparison

Model / Tokenizer	Branch	Passed	Failed	Total
Post-v15	This PR	48	0	48
Post-v15	`main`	26	22	48
Pre-v15	This PR	16	0	16
Pre-v15	`main`	12	4	16

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results

gemini-code-assist

Code Review

This pull request introduces a robust, grammar-based approach using Lark to fix several long-standing issues with Mistral tool calling and reasoning parsing. The changes are comprehensive, covering both streaming and non-streaming modes, various tool choice options, and different tokenizer versions. The introduction of MistralGrammarFactory and the associated Jinja templates for grammar generation is a solid design choice for managing the complexity. The test suite has been significantly expanded, which is excellent for ensuring the correctness of these critical changes. I've found one critical issue in the streaming logic that could lead to corrupted output, for which I've provided a specific comment and suggestion.

gemini-code-assist · 2026-03-14T23:55:38Z

+            if self.bot_token not in delta_text:
+                return DeltaMessage(content=delta_text)


This condition can lead to corrupted output during streaming. If the bot_token (e.g., "[TOOL_CALLS]") is split across multiple streaming deltas, delta_text may not contain the full token. In this scenario, the current logic incorrectly returns a partial token string as content, which is incorrect.

The correct behavior should be to wait for more tokens by returning None, especially since the calling function has already confirmed that the bot_token exists in the cumulative current_text. This ensures that partial tool call markers are not leaked as content.

return None

special tokens are not split so don't think it's true

Signed-off-by: juliendenize <julien.denize@mistral.ai>

mergify · 2026-03-14T23:56:46Z

Hi @juliendenize, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: juliendenize <julien.denize@mistral.ai>

mergify · 2026-03-15T00:00:33Z

Hi @juliendenize, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: juliendenize <julien.denize@mistral.ai>

mergify · 2026-03-15T00:40:13Z

Hi @juliendenize, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: juliendenize <julien.denize@mistral.ai>

patrickvonplaten · 2026-03-15T17:16:39Z

+    prompt_tokens = mistral_tokenizer.tokenizer.encode(
+        "Just a regular prompt.", bos=False, eos=False
+    )
+    assert parser.is_reasoning_end(prompt_tokens) is False


Suggested change

assert parser.is_reasoning_end(prompt_tokens) is False

assert not parser.is_reasoning_end(prompt_tokens)

patrickvonplaten · 2026-03-15T17:16:52Z

+        + mistral_tokenizer.tokenizer.encode(prompt_text_after, bos=False, eos=False)
+    )
+
+    assert parser.is_reasoning_end(prompt_tokens) is True


Suggested change

assert parser.is_reasoning_end(prompt_tokens) is True

assert parser.is_reasoning_end(prompt_tokens)

patrickvonplaten · 2026-03-15T17:20:30Z

+            self.tool_parser, MistralToolParser
+        )
+        if _is_mistral_tool_parser and self.reasoning_parser_cls is not None:
+            MistralToolParser._has_reasoning_parser = True


Suggested change

MistralToolParser._has_reasoning_parser = True

why are we setting a bool of the MistralToolParser here? could we just do this directly in the get_tool_parser function?

patrickvonplaten · 2026-03-15T17:24:42Z

                        )
                        harmony_tools_streamed[i] |= tools_streamed_flag
+                    elif use_mistral_grammar and reasoning_parser:
+                        assert tool_parser is not None


Can we move all this new code into a chat_completion/mistral_grammar.py file?

patrickvonplaten · 2026-03-15T17:26:53Z

+                and tokenizer.version >= 15
+                and request.reasoning_effort in (None, "none")
            )
+            if not skip_reasoning:


Think this is a lot of extra code here.

Couldn't we just set:

self.reasoning_parser = None

if skip_reasoning is False much earlier above so that the code should stay exactly the same here?

patrickvonplaten · 2026-03-15T17:27:26Z

+                request, tokenizer, self.tool_parser
+            )
+            tool_call_items: list[ToolCall]
+            if use_mistral_grammar:


can we maybe also import this code from another function?

Also this one is about Mistral API, not related with grammar no?

patrickvonplaten · 2026-03-15T17:30:03Z

+        use_mistral_grammar = (
+            isinstance(request, ChatCompletionRequest)
+            and tokenizer is not None
+            and is_mistral_lark_grammar_active(
+                request,
+                tokenizer,
+                tool_parser_cls,  # type: ignore[arg-type]
            )


Suggested change

use_mistral_grammar = (

isinstance(request, ChatCompletionRequest)

and tokenizer is not None

and is_mistral_lark_grammar_active(

request,

tokenizer,

tool_parser_cls, # type: ignore[arg-type]

)

use_mistral_grammar = isinstance(tokenizer, MistralTokenizer) and tokenizer.is_grammar_active(request)

can't we inject more logic through the MistralTokenizer here?

joa-stdn · 2026-03-15T18:31:15Z

+
+        if not request.tools:
+            # Sanitize tool_choice.
+            request.tool_choice = "none"


Are we sure about this?

sfeng33

Nice work on the correctness fixes! One note on the performance side:

The Lark grammar is applied for all tool_choice modes, including "none" and "auto". This means every Mistral tool-calling request pays guided decoding overhead (grammar compilation per request + bitmask computation per token), even when the grammar is essentially unconstrained (e.g. tool_choice="none" produces body: content which matches nearly everything).

For "none" specifically, the only real effect is preventing the <TOOL_CALLS> special token from being emitted — which could be achieved by just masking that single token ID without involving the grammar engine.

joa-stdn · 2026-03-15T18:51:26Z

+        seen_special_tokens: set[str] = set()
+        for i in range(self._tokenizer.n_words):
+            # Convert square brackets to angle brackets for special tokens,
+            # since llg only recognizes the latter.


I think now llg recognizes as well token ids but maybe keeping it this way is more modular here?

Signed-off-by: juliendenize <julien.denize@mistral.ai>

juliendenize · 2026-03-16T13:04:51Z

I pushed a commit to enforce tool_choice="auto" for now and we inject grammar. This will be used for a special docker image as this PR entangles too many changes and require deeper design thoughts.
It would be better to break it in multiple smaller pr that target specific parts (grammar, parser, ...)

Signed-off-by: juliendenize <julien.denize@mistral.ai>

mergify · 2026-03-17T00:41:59Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @juliendenize.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

patrickvonplaten · 2026-03-17T11:05:50Z

+    )
+
+
+BASE_LARK_GRAMMAR = r"""start: body


grammar definition

When Mistral Lark grammar constrains tool call generation (PR vllm-project#37081), the streaming tool parser's streamed_args_for_tool list is never populated — the grammar path handles token generation differently from the incremental [TOOL_CALLS] parsing path. At finish_reason=tool_calls, the code accesses streamed_args_for_tool[index] unconditionally, causing: IndexError: list index out of range The grammar path already streams tool call arguments correctly via delta chunks — the remaining-args diff at finish time is unnecessary. Fix: bounds-check streamed_args_for_tool and skip the remaining-args computation when the list is empty (grammar/structured output path). Tested: short and long streaming tool calls produce valid JSON. Affects: Any model using Mistral Lark grammar tool parsing with streaming. Discovered: DGX Spark with Mistral Small 4 + Open WebUI, March 2026.

androiddrew · 2026-04-10T02:08:07Z

@juliendenize Is this still on going?

juliendenize · 2026-04-10T08:13:42Z

@androiddrew yeah but this PR is on stale in favor of #38150 and #39217. Should be done soon !

juliendenize · 2026-04-15T12:13:13Z

Closed in favor of #38150 and #39217

juliendenize requested review from DarkLight1337, NickLucche, aarnphm, benchislett, chaunceyjiang, mgoin, njhill, patrickvonplaten and russellb as code owners March 14, 2026 23:53

juliendenize changed the title ~~Fix mistral parsing~~ Add Mistral Guidance Mar 14, 2026

mergify Bot added frontend structured-output v1 labels Mar 14, 2026

github-project-automation Bot added this to Structured Output Mar 14, 2026

gemini-code-assist Bot reviewed Mar 14, 2026

View reviewed changes

Add Mistral Guidance

76803b1

Signed-off-by: juliendenize <julien.denize@mistral.ai>

Improvement

5fee4b0

Signed-off-by: juliendenize <julien.denize@mistral.ai>

juliendenize force-pushed the fix_mistral_parsing branch from 35dc6d4 to 5fee4b0 Compare March 14, 2026 23:56

Fix mypy

114ee57

Signed-off-by: juliendenize <julien.denize@mistral.ai>

fix mypy

465119e

Signed-off-by: juliendenize <julien.denize@mistral.ai>

patrickvonplaten reviewed Mar 15, 2026

View reviewed changes

joa-stdn reviewed Mar 15, 2026

View reviewed changes

sfeng33 reviewed Mar 15, 2026

View reviewed changes

joa-stdn reviewed Mar 15, 2026

View reviewed changes

Enforce 'auto' tool choice

b59d6ce

Signed-off-by: juliendenize <julien.denize@mistral.ai>

Fix no tools path

8e6f431

Signed-off-by: juliendenize <julien.denize@mistral.ai>

juliendenize mentioned this pull request Mar 16, 2026

Fix some Mistral parser issues #37209

Merged

5 tasks

mergify Bot added the needs-rebase label Mar 17, 2026

patrickvonplaten reviewed Mar 17, 2026

View reviewed changes

will-deines mentioned this pull request Mar 17, 2026

[Responses API] Unified tool_choice + structured output via triggered tags will-deines/vllm#1

Closed

7 tasks

arkavo-com mentioned this pull request Mar 18, 2026

Integrate Leanstral + Lean 4 for Formally Verified Agent Operations arkavo-org/arkavo-edge#537

Open

15 tasks

mergify Bot added the tool-calling label Mar 20, 2026

github-project-automation Bot added this to Tool Calling Mar 20, 2026

Martossien mentioned this pull request Mar 20, 2026

[Bug]: Mistral-Small-4-119B-2603 fails on 8x RTX 3090 (SM 8.6) with vLLM v0.17.1: no valid MLA attention backend #37553

Open

ChuckChambers mentioned this pull request Mar 21, 2026

Add mod: fix-mistral-reasoning-effort eugr/spark-vllm-docker#119

Closed

juliendenize mentioned this pull request Mar 25, 2026

[Mistral Grammar] Support Grammar Factory #38150

Merged

5 tasks

bbrowning mentioned this pull request Mar 30, 2026

[Bug]: reasoning_effort passed to MistralCommonTokenizer.apply_chat_template breaks Mistral Small 4 chat completions on vLLM 0.18.0 #38560

Open

1 task

maincodeMax mentioned this pull request Apr 4, 2026

[Bug]: Mistral Small 4 (119B MoE) fails to start on ROCm MI325X - two blocking issues #38972

Closed

1 task

juliendenize mentioned this pull request Apr 7, 2026

[Mistral Grammar] Fix tool and reasoning parsing #39217

Merged

5 tasks

mergify Bot added the mistral Related to Mistral models label Apr 13, 2026

juliendenize closed this Apr 15, 2026

github-project-automation Bot moved this to Done in Structured Output Apr 15, 2026

github-project-automation Bot moved this to Done in Tool Calling Apr 15, 2026

		if self.bot_token not in delta_text:
		return DeltaMessage(content=delta_text)

	assert parser.is_reasoning_end(prompt_tokens) is False
	assert not parser.is_reasoning_end(prompt_tokens)

Uh oh!

Conversation

juliendenize commented Mar 14, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Unit tests (in this repo)

End-to-end tests (external repo)

Test Result

End-to-end results comparison

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Mar 14, 2026

Uh oh!

mergify Bot commented Mar 15, 2026

Uh oh!

mergify Bot commented Mar 15, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfeng33 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juliendenize commented Mar 16, 2026

Uh oh!

mergify Bot commented Mar 17, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

androiddrew commented Apr 10, 2026

Uh oh!

juliendenize commented Apr 10, 2026

Uh oh!

juliendenize commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

juliendenize commented Mar 14, 2026 •

edited by github-actions Bot

Loading