[FIX_FOR_VLLM_LATEST] Quick fix for PR30684 by iboiko-habana · Pull Request #742 · vllm-project/vllm-gaudi

iboiko-habana · 2025-12-19T08:52:14Z

Quick fix for upstream changes: PR30684
Fix for upstream changes: [MoE Refactor][5/N] Isolate zero expert to LongCatFlash vllm#28891 (Port: PR751)
Fix for [MoE Refactor][4/N] Marlin Fp8 Mk vllm#31036
issue: failed test case run_qwen3_compressed_tensor_dynamic_scaling_test

(EngineCore_DP0 pid=5792)     self.quant_method.get_fused_moe_quant_config(self)
(EngineCore_DP0 pid=5792)   File "/root/logs/vllm/vllm/model_executor/layers/quantization/fp8.py", line 1225, in get_fused_moe_quant_config
(EngineCore_DP0 pid=5792)     w1_scale=layer.w13_weight_scale,
(EngineCore_DP0 pid=5792)              ^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5792)   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1964, in __getattr__
(EngineCore_DP0 pid=5792)     raise AttributeError(
(EngineCore_DP0 pid=5792) AttributeError: 'FusedMoE' object has no attribute 'w13_weight_scale'. Did you mean: 'w13_weight_scale_inv'```

This issue was already present, but it was not detected as marlin was disabled. After moe refactor in https://github.com/vllm-project/vllm/pull/31036, parameter self.use_marlin was replaced by self.fp8_backend. self.fp8_backend is disabled now

Signed-off-by: Iryna Boiko <iboiko@habana.ai>

github-actions · 2025-12-22T14:27:45Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

For CI pass

Signed-off-by: Iryna Boiko <iboiko@habana.ai>

github-actions · 2025-12-23T00:34:44Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
85aff45e24de7af96d30baa1d7d0fc7aec43c28a

1) Quick fix for upstream changes: [PR30684](vllm-project/vllm#30684) 2) Fix for upstream changes: vllm-project/vllm#28891 (Port: [PR751](vllm-project#751)) 3) Fix for vllm-project/vllm#31036 issue: failed test case run_qwen3_compressed_tensor_dynamic_scaling_test ```(EngineCore_DP0 pid=5792) File "/root/logs/vllm/vllm/model_executor/layers/fused_moe/layer.py", line 1487, in ensure_moe_quant_config_init (EngineCore_DP0 pid=5792) self.quant_method.get_fused_moe_quant_config(self) (EngineCore_DP0 pid=5792) File "/root/logs/vllm/vllm/model_executor/layers/quantization/fp8.py", line 1225, in get_fused_moe_quant_config (EngineCore_DP0 pid=5792) w1_scale=layer.w13_weight_scale, (EngineCore_DP0 pid=5792) ^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=5792) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1964, in __getattr__ (EngineCore_DP0 pid=5792) raise AttributeError( (EngineCore_DP0 pid=5792) AttributeError: 'FusedMoE' object has no attribute 'w13_weight_scale'. Did you mean: 'w13_weight_scale_inv'``` This issue was already present, but it was not detected as marlin was disabled. After moe refactor in vllm-project/vllm#31036, parameter self.use_marlin was replaced by self.fp8_backend. self.fp8_backend is disabled now --------- Signed-off-by: Iryna Boiko <iboiko@habana.ai>

1) Quick fix for upstream changes: [PR30684](vllm-project/vllm#30684) 2) Fix for upstream changes: vllm-project/vllm#28891 (Port: [PR751](vllm-project#751)) 3) Fix for vllm-project/vllm#31036 issue: failed test case run_qwen3_compressed_tensor_dynamic_scaling_test ```(EngineCore_DP0 pid=5792) File "/root/logs/vllm/vllm/model_executor/layers/fused_moe/layer.py", line 1487, in ensure_moe_quant_config_init (EngineCore_DP0 pid=5792) self.quant_method.get_fused_moe_quant_config(self) (EngineCore_DP0 pid=5792) File "/root/logs/vllm/vllm/model_executor/layers/quantization/fp8.py", line 1225, in get_fused_moe_quant_config (EngineCore_DP0 pid=5792) w1_scale=layer.w13_weight_scale, (EngineCore_DP0 pid=5792) ^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=5792) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1964, in __getattr__ (EngineCore_DP0 pid=5792) raise AttributeError( (EngineCore_DP0 pid=5792) AttributeError: 'FusedMoE' object has no attribute 'w13_weight_scale'. Did you mean: 'w13_weight_scale_inv'``` This issue was already present, but it was not detected as marlin was disabled. After moe refactor in vllm-project/vllm#31036, parameter self.use_marlin was replaced by self.fp8_backend. self.fp8_backend is disabled now --------- Signed-off-by: Iryna Boiko <iboiko@habana.ai> Signed-off-by: slokesha <slokeshappa@habana.ai>

1) Quick fix for upstream changes: [PR30684](vllm-project/vllm#30684) 2) Fix for upstream changes: vllm-project/vllm#28891 (Port: [PR751](vllm-project#751)) 3) Fix for vllm-project/vllm#31036 issue: failed test case run_qwen3_compressed_tensor_dynamic_scaling_test ```(EngineCore_DP0 pid=5792) File "/root/logs/vllm/vllm/model_executor/layers/fused_moe/layer.py", line 1487, in ensure_moe_quant_config_init (EngineCore_DP0 pid=5792) self.quant_method.get_fused_moe_quant_config(self) (EngineCore_DP0 pid=5792) File "/root/logs/vllm/vllm/model_executor/layers/quantization/fp8.py", line 1225, in get_fused_moe_quant_config (EngineCore_DP0 pid=5792) w1_scale=layer.w13_weight_scale, (EngineCore_DP0 pid=5792) ^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=5792) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1964, in __getattr__ (EngineCore_DP0 pid=5792) raise AttributeError( (EngineCore_DP0 pid=5792) AttributeError: 'FusedMoE' object has no attribute 'w13_weight_scale'. Did you mean: 'w13_weight_scale_inv'``` This issue was already present, but it was not detected as marlin was disabled. After moe refactor in vllm-project/vllm#31036, parameter self.use_marlin was replaced by self.fp8_backend. self.fp8_backend is disabled now --------- Signed-off-by: Iryna Boiko <iboiko@habana.ai>

Quick fix for PR30684

1f86bcd

Signed-off-by: Iryna Boiko <iboiko@habana.ai>

iboiko-habana requested review from adobrzyn, afierka-intel, kamil-kaczor, ksmusz, kzawora-intel, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners December 19, 2025 08:52

iboiko-habana changed the title ~~Quick fix for PR30684~~ [FIX_FOR_VLLM_LATEST] Quick fix for PR30684 Dec 19, 2025

github-actions Bot mentioned this pull request Dec 19, 2025

🚦 Team Review Dashboard #701

Open

iboiko-habana added 4 commits December 19, 2025 13:40

[FIX_FOR_VLLM_LATEST] after review

c8a8f28

Signed-off-by: Iryna Boiko <iboiko@habana.ai>

Disabling of test case for gemma3

a35e9bf

Signed-off-by: Iryna Boiko <iboiko@habana.ai>

Merge branch 'main' into pr30684

8b5f816

adding PR751

97617a9

Signed-off-by: Iryna Boiko <iboiko@habana.ai>

Merge branch 'main' into pr30684

370aa7b

libinta reviewed Dec 22, 2025

View reviewed changes

Comment thread tests/full_tests/ci_gsm8k_tests.sh

iboiko-habana added 2 commits December 22, 2025 22:03

Update ci_gsm8k_tests.sh

6401079

For CI pass

Fix for PR31036

f7fb90a

Signed-off-by: Iryna Boiko <iboiko@habana.ai>

adobrzyn approved these changes Dec 23, 2025

View reviewed changes

adobrzyn merged commit ac9cb19 into vllm-project:main Dec 23, 2025
50 checks passed

pawel-olejniczak mentioned this pull request Dec 23, 2025

[FIX_FOR_VLLM_LATEST] Remove zero_expert_num #751

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX_FOR_VLLM_LATEST] Quick fix for PR30684#742

[FIX_FOR_VLLM_LATEST] Quick fix for PR30684#742
adobrzyn merged 8 commits intovllm-project:mainfrom
iboiko-habana:pr30684

iboiko-habana commented Dec 19, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Dec 22, 2025

Uh oh!

Uh oh!

github-actions Bot commented Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

iboiko-habana commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Dec 22, 2025

🚧 CI Blocked

Uh oh!

Uh oh!

github-actions Bot commented Dec 23, 2025

✅ CI Passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

iboiko-habana commented Dec 19, 2025 •

edited

Loading