Add Mistral Small 4 (Pixtral) support#20708
Merged
Kangyan-Zhou merged 30 commits intosgl-project:mainfrom Mar 18, 2026
Merged
Conversation
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
…size Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
…processor - Use patch_size * spatial_merge_size as the effective patch size in PixtralImageProcessor so images resize to multiples of 28 (not 14), matching PatchMerger requirements with spatial_merge_size=2 - Remove manual _resize and get_patch_grid_size methods, relying on the correctly configured HF image processor instead - Add multi-image offset splitting for per-image MultimodalDataItem - Remove unused torch import
- Add --model flag (default "default") to avoid hardcoded model name - Add --reasoning-effort flag passed as top-level request field - Support local image paths via base64 data URI encoding - Pass reasoning_effort and model as explicit parameters instead of smuggling through sampling_params dict
…riable The flashinfer trtllm_fp8_per_tensor_scale_moe already defaults activation_type to Swiglu (3), which matches Mistral-Small-4's silu+gated config. Also replace unused ncols with _ in pixtral processor.
…al with 0% accuracy when thinking
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
alexnails
reviewed
Mar 16, 2026
|
|
||
| tokenizer = get_tokenizer_from_processor(processor) | ||
|
|
||
| if tokenizer.chat_template is None: |
Collaborator
There was a problem hiding this comment.
do we keep this? (I actually think this is a useful fallback but it should be improved at a later point)
The EAGLE draft model for Mistral Small 4 (mistralai/Mistral-Small-4-119B-2603-eagle) uses dense MLA layers without MoE, unlike the Mistral Large 3 EAGLE which has MoE. This caused three issues: 1. `adapt_config_dict` in mistral_utils.py did not handle dense EAGLE models (moe=null in params.json), falling through to an unsupported architecture. Fix: add a branch for `is_eagle and not is_moe` that sets model_type=deepseek_v3 with all-dense MoE overrides (first_k_dense_replace=num_layers). 2. `_remap_mistral_yarn_args` did not include rope_theta in rope_scaling, causing transformers yarn validation to fail. Fix: copy rope_theta into the rope_scaling dict. 3. `MistralLarge3ForCausalLMEagle.__init__` set `self.model_cls` but `DeepseekV2ForCausalLM.__init__` hardcodes `self.model = DeepseekV2Model`, so the EAGLE fc layer was never created. The draft model ran without fusing token embeddings with target hidden states, producing garbage draft tokens (accept rate 0.25). Fix: call super().__init__() then replace self.model with MistralLarge3EagleModel which has the fc layer. Accept rate: 0.25 -> 0.83.
ad7bbb2 to
773f851
Compare
Collaborator
Author
|
/rerun-failed-ci |
5 similar comments
Collaborator
Author
|
/rerun-failed-ci |
Collaborator
|
/rerun-failed-ci |
Collaborator
Author
|
/rerun-failed-ci |
Collaborator
|
/rerun-failed-ci |
Collaborator
|
/rerun-failed-ci |
Contributor
|
Here is a diff to improve the
diff |
Collaborator
Author
|
/rerun-failed-ci |
Mistral Small 4's params.json sets "apply_scale": false in the yarn config, meaning the mscale factor should NOT be applied to attention logits scaling. Previously this field was discarded, causing an incorrect 2.2x mscale to be applied unconditionally. Changes: - Map "apply_scale" to "apply_yarn_scaling" in rope_scaling dict instead of dropping it - Use "deepseek_yarn" rope_type to avoid transformers yarn validation issues - Gate mscale application in DeepseekV2AttentionMLA on apply_yarn_scaling gsm8k 5-shot exact_match: 0.7976 -> 0.8901 (+9.3%)
Collaborator
Author
|
@dbari I've just pushed the fix you made on the rope. Thanks a lot for that! I also apologize for the earlier wrong decision of not including this fix. |
Collaborator
|
/rerun-failed-ci |
1 similar comment
Collaborator
Author
|
/rerun-failed-ci |
Qiaolin-Yu
added a commit
that referenced
this pull request
Mar 18, 2026
Wangzheee
pushed a commit
to Wangzheee/sglang
that referenced
this pull request
Mar 21, 2026
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Alex Nails <alexnails@radixark.ai> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: dbari <dbari@users.noreply.github.com>
Wangzheee
pushed a commit
to Wangzheee/sglang
that referenced
this pull request
Mar 21, 2026
0-693
pushed a commit
to 0-693/sglang
that referenced
this pull request
Mar 25, 2026
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Alex Nails <alexnails@radixark.ai> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: dbari <dbari@users.noreply.github.com>
0-693
pushed a commit
to 0-693/sglang
that referenced
this pull request
Mar 25, 2026
dutsc
pushed a commit
to dutsc/sglang
that referenced
this pull request
Mar 30, 2026
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Alex Nails <alexnails@radixark.ai> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: dbari <dbari@users.noreply.github.com>
dutsc
pushed a commit
to dutsc/sglang
that referenced
this pull request
Mar 30, 2026
JustinTong0323
added a commit
to JustinTong0323/sglang
that referenced
this pull request
Apr 7, 2026
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Alex Nails <alexnails@radixark.ai> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: dbari <dbari@users.noreply.github.com>
JustinTong0323
pushed a commit
to JustinTong0323/sglang
that referenced
this pull request
Apr 7, 2026
yhyang201
pushed a commit
to yhyang201/sglang
that referenced
this pull request
Apr 22, 2026
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Alex Nails <alexnails@radixark.ai> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: dbari <dbari@users.noreply.github.com>
yhyang201
pushed a commit
to yhyang201/sglang
that referenced
this pull request
Apr 22, 2026
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
params.json) for Mistral Small 4 and LeanStral model variants[THINK]/[/THINK]format) withreasoning_effort="high"gatingspatial_merge_sizehandling,rope_parameterscompatibility, and fallbackPixtralProcessorwrapping whenprocessor_config.jsonis missingchat_template.jinjafrom model repo when tokenizer has no chat template[THINK]/[/THINK]as special tokens (upstream issue), which causesskip_special_tokens=Trueto strip reasoning markers before the parser can see themCo-authored-by: Alex Nails alexnails@radixark.ai
Usage
Eval results (GSM8K)
Mistral-Small-4-119B-2603(FP8)Mistral-Small-4-119B-2603-NVFP4Test plan
mistralai/Mistral-Small-4-119B-2603loads and generates correct output with--tp 2--reasoning-parser mistralcorrectly extracts[THINK]/[/THINK]blocks intoreasoning_contentreasoning_effort="high"triggers thinking,"none"does not--tool-call-parser mistral