convert: Fix Qwen3.5/Qwen3.5 Moe NVFP4 Conversions#20505
Open
michaelw9999 wants to merge 4 commits intoggml-org:masterfrom
Open
convert: Fix Qwen3.5/Qwen3.5 Moe NVFP4 Conversions#20505michaelw9999 wants to merge 4 commits intoggml-org:masterfrom
michaelw9999 wants to merge 4 commits intoggml-org:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes Qwen3.5 / Qwen3.5-MoE NVFP4 HF→GGUF conversion failures by improving tensor name mapping and applying Qwen3.5 linear-attention-specific reordering during NVFP4 repacking.
Changes:
- Extend tensor name mapping to handle
model.language_model.*/language_model.*wrapper prefixes. - Add Qwen3.5 NVFP4 linear-attention weight transforms (row/column reordering) during NVFP4 repack.
- Skip writing NVFP4 auxiliary tensors and already-repacked tensors during the prepare/write loop.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _nvfp4_scale2_is_trivial(scale2: Tensor) -> bool: | ||
| return scale2.numel() <= 1 and abs(float(scale2.float().sum()) - 1.0) < 1e-6 | ||
|
|
||
| def _transform_nvfp4_weight(self, raw_weight_name: str, weight: Tensor, scale: Tensor, bid: int | None) -> tuple[str, Tensor, Tensor]: |
Comment on lines
+744
to
+746
| bid_m = re.search(r'\.layers\.(\d+)\.', name) | ||
| bid = int(bid_m.group(1)) if bid_m else None | ||
| new_name, weight, scale = self._transform_nvfp4_weight(name, weight, scale, bid) |
Comment on lines
+685
to
+686
| if transformed_components is not None: | ||
| weight, scale = transformed_components |
This comment was marked as resolved.
This comment was marked as resolved.
Contributor
Author
|
@arthurcavalcant I'll post an update for that shortly. |
3e7ded9 to
43b0892
Compare
Contributor
Author
|
@arthurcavalcant I've fixed that and you should be able to convert and run that model now. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR fixes several errors that occur when attempting to convert Qwen3.5/Qwen3.5Moe models. To keep this PR scope in check and specific, a separate PR #20506 allows loading of these newly converted models.
Bug:
When attempting to use
convert_hf_to_gguf.pyon various Qwen3.5 and Qwen3.5 MoE models, it would abort with the following error(s):This occurred because these models now have
model.language_modelorlanguage_modelprefixes. The fix strips the wrappers instead of failing, which allows it to continue.But just stripping the names and continuing was not enough to get the models converted properly, so it would cause a new error:
This is because Qwen3.5's linear attention weights get reordered in
modify_tensors():However NVFP4 bypasses
modify_tensors()and has its own repacking, andlinear_attn.in_proj_a.input_scalewas seen by as a [num_v_heads] tensor and tried to reshape it into [16, 3, 1].This is fixed by skipping tensors in the write loop that already were repacked
and by applying the same reordering for :
This will now produce the correct Qwen3.5/Qwen3.5MoE NVFP4 GGUF file. A separate PR must be applied to load these files.
This fixed the issue with both Qwen3.5-122B-A10B-NVFP4 and Qwen3.5-27B-NVFP4 and correctly produced proper output.
Qwen3.5-35B-A3B-NVFP4.gguf was also tested after returning k_scale and v_scale to the skip list.
Note, some Qwen3.5 NVFP4 HF models produce this tokenizer error and others don't for the same model:
Workaround:
Edit the model's tokenizer_config.json and change tokenizer_class from TokenizersBackend to Qwen2Tokenizer