llama: Wire up Qwen3.5/Qwen3.5MoE tensors for NVFP4 support by michaelw9999 · Pull Request #20506 · ggml-org/llama.cpp

michaelw9999 · 2026-03-13T12:12:40Z

PR #20505 fixes the conversion errors for making Qwen3.5 NVFP4 GGUF files and properly reorders the Qwen3.5 linear attention layers, but without this update, those models will not load.

This update wires up the Qwen3.5 tensors so they are properly loaded from Qwen3.5 NVFP4 gguf files and follows the same design intent using build_lora_mm:

This links up the:
recurrent / linear-attention tensors
FFN tensors that were loading but were not using scales
MoE shared-expert FFN scales

Copilot

Pull request overview

This PR wires Qwen3.5 and Qwen3.5MoE tensor scale metadata into the model build path so NVFP4 GGUFs load correctly (including linear-attention/recurrent and MoE/shared-expert FFN scale handling).

Changes:

Pass per-tensor scale tensors into build_lora_mm for attention and linear-attention (SSM) projections.
Pass per-tensor and per-expert scale tensors into FFN / MoE FFN builders (including shared experts).
Extend llama_layer and tensor loading to create optional scale tensors for the newly wired weights.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
src/models/qwen35moe.cpp	Threads newly loaded scale tensors through Qwen3.5MoE attention, linear-attention, and MoE/shared-expert FFN paths.
src/models/qwen35.cpp	Threads newly loaded scale tensors through Qwen3.5 attention, linear-attention, and dense FFN paths.
src/llama-model.h	Adds layer members to store new scale tensors (QKV mixed, gate, SSM, shared-expert FFN scales).
src/llama-model.cpp	Creates optional scale tensors for the new layer scale members during tensor loading.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/models/qwen35moe.cpp

src/llama-model.cpp

src/models/qwen35moe.cpp

llama : fix Qwen3.5 NVFP4 scale loading

478822c

michaelw9999 requested a review from CISC as a code owner March 13, 2026 12:12

Copilot AI review requested due to automatic review settings March 13, 2026 12:12

github-actions bot added the model Model specific label Mar 13, 2026

michaelw9999 mentioned this pull request Mar 13, 2026

convert: Fix Qwen3.5/Qwen3.5 Moe NVFP4 Conversions #20505

Open

Copilot AI reviewed Mar 13, 2026

View reviewed changes

src/models/qwen35moe.cpp Show resolved Hide resolved

src/llama-model.cpp Show resolved Hide resolved

src/models/qwen35moe.cpp Show resolved Hide resolved

michaelw9999 changed the title ~~ggml: Wire up Qwen3.5/Qwen3.5MoE tensors for NVFP4 support~~ llama: Wire up Qwen3.5/Qwen3.5MoE tensors for NVFP4 support Mar 13, 2026

Copilot started reviewing on behalf of michaelw9999 March 14, 2026 02:39 View session

Merge branch 'ggml-org:master' into nvfp4-qwen35-nvfp4-loader-fix

a89e95f

CISC mentioned this pull request Mar 14, 2026

Misc. bug: convert_hf_to_gguf.py does not support Nemotron3 NVFP4 #20504

Open

CISC approved these changes Mar 14, 2026

View reviewed changes

CISC mentioned this pull request Mar 14, 2026

convert : support mixed-precision ModelOpt models with per-tensor NVFP4/FP8 quantization #20539

Draft

CISC merged commit d23355a into ggml-org:master Mar 14, 2026
6 of 80 checks passed

michaelw9999 deleted the nvfp4-qwen35-nvfp4-loader-fix branch March 14, 2026 21:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama: Wire up Qwen3.5/Qwen3.5MoE tensors for NVFP4 support#20506

llama: Wire up Qwen3.5/Qwen3.5MoE tensors for NVFP4 support#20506
CISC merged 2 commits intoggml-org:masterfrom
michaelw9999:nvfp4-qwen35-nvfp4-loader-fix

michaelw9999 commented Mar 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

michaelw9999 commented Mar 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants