Skip to content

convert: Fix Qwen3.5/Qwen3.5 Moe NVFP4 Conversions#20505

Open
michaelw9999 wants to merge 4 commits intoggml-org:masterfrom
michaelw9999:nvfp4-fix-qwen-conversions
Open

convert: Fix Qwen3.5/Qwen3.5 Moe NVFP4 Conversions#20505
michaelw9999 wants to merge 4 commits intoggml-org:masterfrom
michaelw9999:nvfp4-fix-qwen-conversions

Conversation

@michaelw9999
Copy link
Contributor

@michaelw9999 michaelw9999 commented Mar 13, 2026

This PR fixes several errors that occur when attempting to convert Qwen3.5/Qwen3.5Moe models. To keep this PR scope in check and specific, a separate PR #20506 allows loading of these newly converted models.

Bug:
When attempting to use convert_hf_to_gguf.py on various Qwen3.5 and Qwen3.5 MoE models, it would abort with the following error(s):

ValueError: Can not map tensor 'model.language_model.layers.0.mlp.shared_expert.down_proj.weight'
ValueError: Can not map tensor 'model.language_model.layers.0.linear_attn.in_proj_a.weight'

This occurred because these models now have model.language_model or language_model prefixes. The fix strips the wrappers instead of failing, which allows it to continue.
But just stripping the names and continuing was not enough to get the models converted properly, so it would cause a new error:

RuntimeError: shape '[16, 3, 1]' is invalid for input of size 1

This is because Qwen3.5's linear attention weights get reordered in modify_tensors():

# original order:  [q, k, v, z] * head_count
# corrected order: [q * head_count, k * head_count, v * head_count, z * head_count]

However NVFP4 bypasses modify_tensors() and has its own repacking, and linear_attn.in_proj_a.input_scale was seen by as a [num_v_heads] tensor and tried to reshape it into [16, 3, 1].
This is fixed by skipping tensors in the write loop that already were repacked

if self._is_nvfp4:
                if name.endswith(".weight") and name.replace(".weight", ".weight_scale") in self.model_tensors:
                    continue
                if name.endswith((".weight_scale", ".weight_scale_2", ".input_scale", "k_scale", ".v_scale"))
                    continue
 Updated: added k_scale and v_scale above

and by applying the same reordering for :

linear_attn.in_proj_qkv
linear_attn.in_proj_z
linear_attn.in_proj_a
linear_attn.in_proj_b
linear_attn.out_proj

This will now produce the correct Qwen3.5/Qwen3.5MoE NVFP4 GGUF file. A separate PR must be applied to load these files.
This fixed the issue with both Qwen3.5-122B-A10B-NVFP4 and Qwen3.5-27B-NVFP4 and correctly produced proper output.
Qwen3.5-35B-A3B-NVFP4.gguf was also tested after returning k_scale and v_scale to the skip list.

Note, some Qwen3.5 NVFP4 HF models produce this tokenizer error and others don't for the same model:

ValueError: Tokenizer class TokenizersBackend does not exist or is not currently imported.

Workaround:
Edit the model's tokenizer_config.json and change tokenizer_class from TokenizersBackend to Qwen2Tokenizer

@michaelw9999 michaelw9999 requested a review from CISC as a code owner March 13, 2026 12:00
Copilot AI review requested due to automatic review settings March 13, 2026 12:00
@github-actions github-actions bot added the python python script changes label Mar 13, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Qwen3.5 / Qwen3.5-MoE NVFP4 HF→GGUF conversion failures by improving tensor name mapping and applying Qwen3.5 linear-attention-specific reordering during NVFP4 repacking.

Changes:

  • Extend tensor name mapping to handle model.language_model.* / language_model.* wrapper prefixes.
  • Add Qwen3.5 NVFP4 linear-attention weight transforms (row/column reordering) during NVFP4 repack.
  • Skip writing NVFP4 auxiliary tensors and already-repacked tensors during the prepare/write loop.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

def _nvfp4_scale2_is_trivial(scale2: Tensor) -> bool:
return scale2.numel() <= 1 and abs(float(scale2.float().sum()) - 1.0) < 1e-6

def _transform_nvfp4_weight(self, raw_weight_name: str, weight: Tensor, scale: Tensor, bid: int | None) -> tuple[str, Tensor, Tensor]:
Comment on lines +744 to +746
bid_m = re.search(r'\.layers\.(\d+)\.', name)
bid = int(bid_m.group(1)) if bid_m else None
new_name, weight, scale = self._transform_nvfp4_weight(name, weight, scale, bid)
Comment on lines +685 to +686
if transformed_components is not None:
weight, scale = transformed_components
@michaelw9999 michaelw9999 changed the title ggml: Fix Qwen3.5/Qwen3.5 Moe NVFP4 Conversions convert: Fix Qwen3.5/Qwen3.5 Moe NVFP4 Conversions Mar 13, 2026
@arthurcavalcant

This comment was marked as resolved.

@michaelw9999
Copy link
Contributor Author

@arthurcavalcant I'll post an update for that shortly.

@michaelw9999 michaelw9999 force-pushed the nvfp4-fix-qwen-conversions branch from 3e7ded9 to 43b0892 Compare March 13, 2026 18:42
@michaelw9999
Copy link
Contributor Author

@arthurcavalcant I've fixed that and you should be able to convert and run that model now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants