Add Mistral Small 4 (Pixtral) support by JustinTong0323 · Pull Request #20708 · sgl-project/sglang

JustinTong0323 · 2026-03-16T17:17:19Z

Summary

Add Mistral Small 4 (119B) model support, reusing the MistralLarge3/DeepSeekV3 backend with Pixtral vision encoder
Handle Mistral-native config format (params.json) for Mistral Small 4 and LeanStral model variants
Add Mistral reasoning parser ([THINK]/[/THINK] format) with reasoning_effort="high" gating
Fix Pixtral vision processor: proper spatial_merge_size handling, rope_parameters compatibility, and fallback PixtralProcessor wrapping when processor_config.json is missing
Load chat_template.jinja from model repo when tokenizer has no chat template
Workaround Mistral tokenizer marking [THINK]/[/THINK] as special tokens (upstream issue), which causes skip_special_tokens=True to strip reasoning markers before the parser can see them

Co-authored-by: Alex Nails alexnails@radixark.ai

Usage

# FP8
python -m sglang.launch_server \
  --model-path mistralai/Mistral-Small-4-119B-2603 \
  --tp 2 \
  --reasoning-parser mistral \
  --tool-call-parser mistral

# NVFP4
python -m sglang.launch_server \
  --model-path mistralai/Mistral-Small-4-119B-2603-NVFP4 \
  --tp 2 \
  --reasoning-parser mistral \
  --tool-call-parser mistral

Eval results (GSM8K)

Checkpoint	GSM8K Accuracy
`Mistral-Small-4-119B-2603` (FP8)	0.835
`Mistral-Small-4-119B-2603-NVFP4`	0.826

Test plan

Verify mistralai/Mistral-Small-4-119B-2603 loads and generates correct output with --tp 2
GSM8K eval on FP8 (0.835) and NVFP4 (0.826)
Verify --reasoning-parser mistral correctly extracts [THINK]/[/THINK] blocks into reasoning_content
Verify reasoning_effort="high" triggers thinking, "none" does not
Verify tool calls (single and multi) work with --tool-call-parser mistral
Verify streaming (chat, reasoning, tool calls)
Verify vision (image) inputs work through the Pixtral processor

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

…size Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

…processor - Use patch_size * spatial_merge_size as the effective patch size in PixtralImageProcessor so images resize to multiples of 28 (not 14), matching PatchMerger requirements with spatial_merge_size=2 - Remove manual _resize and get_patch_grid_size methods, relying on the correctly configured HF image processor instead - Add multi-image offset splitting for per-image MultimodalDataItem - Remove unused torch import

- Add --model flag (default "default") to avoid hardcoded model name - Add --reasoning-effort flag passed as top-level request field - Support local image paths via base64 data URI encoding - Pass reasoning_effort and model as explicit parameters instead of smuggling through sampling_params dict

…riable The flashinfer trtllm_fp8_per_tensor_scale_moe already defaults activation_type to Swiglu (3), which matches Mistral-Small-4's silu+gated config. Also replace unused ncols with _ in pixtral processor.

…al with 0% accuracy when thinking

gemini-code-assist · 2026-03-16T17:17:24Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

alexnails · 2026-03-16T17:25:01Z

+
    tokenizer = get_tokenizer_from_processor(processor)

+    if tokenizer.chat_template is None:


do we keep this? (I actually think this is a useful fallback but it should be improved at a later point)

…rapper

The EAGLE draft model for Mistral Small 4 (mistralai/Mistral-Small-4-119B-2603-eagle) uses dense MLA layers without MoE, unlike the Mistral Large 3 EAGLE which has MoE. This caused three issues: 1. `adapt_config_dict` in mistral_utils.py did not handle dense EAGLE models (moe=null in params.json), falling through to an unsupported architecture. Fix: add a branch for `is_eagle and not is_moe` that sets model_type=deepseek_v3 with all-dense MoE overrides (first_k_dense_replace=num_layers). 2. `_remap_mistral_yarn_args` did not include rope_theta in rope_scaling, causing transformers yarn validation to fail. Fix: copy rope_theta into the rope_scaling dict. 3. `MistralLarge3ForCausalLMEagle.__init__` set `self.model_cls` but `DeepseekV2ForCausalLM.__init__` hardcodes `self.model = DeepseekV2Model`, so the EAGLE fc layer was never created. The draft model ran without fusing token embeddings with target hidden states, producing garbage draft tokens (accept rate 0.25). Fix: call super().__init__() then replace self.model with MistralLarge3EagleModel which has the fc layer. Accept rate: 0.25 -> 0.83.

JustinTong0323 · 2026-03-17T05:42:09Z

/rerun-failed-ci

JustinTong0323 · 2026-03-17T05:52:52Z

/rerun-failed-ci

alexnails · 2026-03-17T06:03:45Z

/rerun-failed-ci

JustinTong0323 · 2026-03-17T06:23:15Z

/rerun-failed-ci

alexnails · 2026-03-17T06:43:43Z

/rerun-failed-ci

alexnails · 2026-03-17T07:25:01Z

/rerun-failed-ci

dbari · 2026-03-17T08:52:48Z

Here is a diff to improve the gsm8k score:

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.9105	±	0.0079
		strict-match	5	exact_match	↑	0.9083	±	0.0080

diff

diff --git a/python/sglang/srt/models/deepseek_v2.py b/python/sglang/srt/models/deepseek_v2.py
index 8f0617142..4750a6532 100644
--- a/python/sglang/srt/models/deepseek_v2.py
+++ b/python/sglang/srt/models/deepseek_v2.py
@@ -1198,7 +1198,7 @@ class DeepseekV2AttentionMLA(
                 device=get_global_server_args().device,
             )
 
-            if rope_scaling:
+            if rope_scaling and rope_scaling.get("apply_yarn_scaling", True):
                 mscale_all_dim = rope_scaling.get("mscale_all_dim", False)
                 scaling_factor = rope_scaling["factor"]
                 mscale = yarn_get_mscale(scaling_factor, float(mscale_all_dim))
diff --git a/python/sglang/srt/utils/mistral_utils.py b/python/sglang/srt/utils/mistral_utils.py
index 4955c0575..dc9e08d94 100644
--- a/python/sglang/srt/utils/mistral_utils.py
+++ b/python/sglang/srt/utils/mistral_utils.py
@@ -134,11 +134,11 @@ def _remap_mistral_yarn_args(config: dict) -> dict:
         "original_max_position_embeddings": "original_max_position_embeddings",
         "beta": "beta_fast",
         "alpha": "beta_slow",
-        "apply_scale": None,
+        "apply_scale": "apply_yarn_scaling",
     }
     yarn_config = config.get("yarn") or {}
     config["rope_scaling"] = {
-        "rope_type": "yarn",
+        "rope_type": "deepseek_yarn",
         "mscale_all_dim": 1,
     }
     # Include rope_theta in rope_scaling if present at the top level,

JustinTong0323 · 2026-03-17T09:55:22Z

/rerun-failed-ci

Mistral Small 4's params.json sets "apply_scale": false in the yarn config, meaning the mscale factor should NOT be applied to attention logits scaling. Previously this field was discarded, causing an incorrect 2.2x mscale to be applied unconditionally. Changes: - Map "apply_scale" to "apply_yarn_scaling" in rope_scaling dict instead of dropping it - Use "deepseek_yarn" rope_type to avoid transformers yarn validation issues - Gate mscale application in DeepseekV2AttentionMLA on apply_yarn_scaling gsm8k 5-shot exact_match: 0.7976 -> 0.8901 (+9.3%)

JustinTong0323 · 2026-03-17T17:21:32Z

@dbari I've just pushed the fix you made on the rope. Thanks a lot for that! I also apologize for the earlier wrong decision of not including this fix.

alexnails · 2026-03-18T06:53:16Z

/rerun-failed-ci

JustinTong0323 · 2026-03-18T18:49:36Z

/rerun-failed-ci

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Alex Nails <alexnails@radixark.ai> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: dbari <dbari@users.noreply.github.com>

JustinTong0323 and others added 15 commits February 28, 2026 13:57

Add Mistral4/Pixtral support changes

7eeffcb

lint

c3297fc

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

Add special handling for mistral 4

296fcd5

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

add reasoning parser for mistral

c7457c9

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

Set default reasoning_effort to None in ChatCompletionRequest

557d6fd

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

fix: Add activation type mapping for FlashInfer in moe_runner

2c4349f

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

fix: add reasoning request handling for mistral 4

0322d01

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

fix: streamline vision config handling in get_processor function

4802ecc

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

fix: adjust patch grid size calculation to incorporate spatial merge …

04a8673

…size Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

cleanup: remove redundant activation_type mapping and unused ncols va…

0f1471e

…riable The flashinfer trtllm_fp8_per_tensor_scale_moe already defaults activation_type to Swiglu (3), which matches Mistral-Small-4's silu+gated config. Also replace unused ncols with _ in pixtral processor.

fix reasoning trace having answer and benchmark getting no answers ev…

e10aa5a

…al with 0% accuracy when thinking

possible fix for -HF chkpt

2041c65

LeanStral works

01c72d6

JustinTong0323 requested review from CatherineSue, ispobock, merrymercy, mickqian, slin1237, yhyang201 and yuan-luo as code owners March 16, 2026 17:17

JustinTong0323 and others added 2 commits March 16, 2026 17:20

Merge branch 'main' into mistral4-support

0da0ae3

lint

afe8772

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

alexnails reviewed Mar 16, 2026

View reviewed changes

JustinTong0323 added 3 commits March 16, 2026 17:38

fix: update model name in MistralDetector docstring (2602 -> 2603)

d508481

fix: expose mistral load format and update MistralDetector docstring

34a699f

fix: use correct custom op name for trtllm_fp8_per_tensor_scale_moe_w…

7da7666

…rapper

JustinTong0323 requested a review from Ying1123 as a code owner March 16, 2026 19:06

JustinTong0323 force-pushed the mistral4-support branch from ad7bbb2 to 773f851 Compare March 17, 2026 04:53

JustinTong0323 requested review from fzyzcjy and zhyncs as code owners March 17, 2026 16:36

Merge branch 'main' into mistral4-support

5d7e3d5

Kangyan-Zhou merged commit 6b8a654 into sgl-project:main Mar 18, 2026
29 of 46 checks passed

Qiaolin-Yu added a commit that referenced this pull request Mar 18, 2026

fix lint introduced in #20708 (#20886)

eea9e19

Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026

fix lint introduced in sgl-project#20708 (sgl-project#20886)

d4752b1

0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026

fix lint introduced in sgl-project#20708 (sgl-project#20886)

6006578

dutsc pushed a commit to dutsc/sglang that referenced this pull request Mar 30, 2026

fix lint introduced in sgl-project#20708 (sgl-project#20886)

791beff

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026

fix lint introduced in sgl-project#20708 (sgl-project#20886)

011d52b

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

fix lint introduced in sgl-project#20708 (sgl-project#20886)

658ebf1

1fanwang mentioned this pull request May 5, 2026

[vlm][pixtral] support precomputed embeddings + processor output #24412

Open

5 tasks


		tokenizer = get_tokenizer_from_processor(processor)

		if tokenizer.chat_template is None:

Conversation

JustinTong0323 commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Usage

Eval results (GSM8K)

Test plan

Uh oh!

gemini-code-assist Bot commented Mar 16, 2026

Uh oh!

alexnails Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

JustinTong0323 commented Mar 17, 2026

Uh oh!

JustinTong0323 commented Mar 17, 2026

Uh oh!

alexnails commented Mar 17, 2026

Uh oh!

JustinTong0323 commented Mar 17, 2026

Uh oh!

alexnails commented Mar 17, 2026

Uh oh!

alexnails commented Mar 17, 2026

Uh oh!

dbari commented Mar 17, 2026

Uh oh!

JustinTong0323 commented Mar 17, 2026

Uh oh!

JustinTong0323 commented Mar 17, 2026

Uh oh!

alexnails commented Mar 18, 2026

Uh oh!

JustinTong0323 commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JustinTong0323 commented Mar 16, 2026 •

edited

Loading