Skip to content

[FIX_FOR_VLLM_LATEST] Quick fix for PR30684#742

Merged
adobrzyn merged 8 commits intovllm-project:mainfrom
iboiko-habana:pr30684
Dec 23, 2025
Merged

[FIX_FOR_VLLM_LATEST] Quick fix for PR30684#742
adobrzyn merged 8 commits intovllm-project:mainfrom
iboiko-habana:pr30684

Conversation

@iboiko-habana
Copy link
Copy Markdown
Collaborator

@iboiko-habana iboiko-habana commented Dec 19, 2025

  1. Quick fix for upstream changes: PR30684
  2. Fix for upstream changes: [MoE Refactor][5/N] Isolate zero expert to LongCatFlash vllm#28891 (Port: PR751)
  3. Fix for [MoE Refactor][4/N] Marlin Fp8 Mk vllm#31036
    issue: failed test case run_qwen3_compressed_tensor_dynamic_scaling_test
(EngineCore_DP0 pid=5792)     self.quant_method.get_fused_moe_quant_config(self)
(EngineCore_DP0 pid=5792)   File "/root/logs/vllm/vllm/model_executor/layers/quantization/fp8.py", line 1225, in get_fused_moe_quant_config
(EngineCore_DP0 pid=5792)     w1_scale=layer.w13_weight_scale,
(EngineCore_DP0 pid=5792)              ^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5792)   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1964, in __getattr__
(EngineCore_DP0 pid=5792)     raise AttributeError(
(EngineCore_DP0 pid=5792) AttributeError: 'FusedMoE' object has no attribute 'w13_weight_scale'. Did you mean: 'w13_weight_scale_inv'```

This issue was already present, but it was not detected as marlin was disabled. After moe refactor in https://github.com/vllm-project/vllm/pull/31036, parameter self.use_marlin was replaced by self.fp8_backend. self.fp8_backend is disabled now

Signed-off-by: Iryna Boiko <iboiko@habana.ai>
@iboiko-habana iboiko-habana changed the title Quick fix for PR30684 [FIX_FOR_VLLM_LATEST] Quick fix for PR30684 Dec 19, 2025
Signed-off-by: Iryna Boiko <iboiko@habana.ai>
Signed-off-by: Iryna Boiko <iboiko@habana.ai>
Signed-off-by: Iryna Boiko <iboiko@habana.ai>
@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

Comment thread tests/full_tests/ci_gsm8k_tests.sh
Signed-off-by: Iryna Boiko <iboiko@habana.ai>
@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
85aff45e24de7af96d30baa1d7d0fc7aec43c28a

@adobrzyn adobrzyn merged commit ac9cb19 into vllm-project:main Dec 23, 2025
50 checks passed
iboiko-habana added a commit to iboiko-habana/vllm-gaudi that referenced this pull request Dec 23, 2025
1) Quick fix for upstream changes:
[PR30684](vllm-project/vllm#30684)
2) Fix for upstream changes:
vllm-project/vllm#28891 (Port:
[PR751](vllm-project#751))
3) Fix for vllm-project/vllm#31036
issue: failed test case run_qwen3_compressed_tensor_dynamic_scaling_test
```(EngineCore_DP0 pid=5792)   File "/root/logs/vllm/vllm/model_executor/layers/fused_moe/layer.py", line 1487, in ensure_moe_quant_config_init
(EngineCore_DP0 pid=5792)     self.quant_method.get_fused_moe_quant_config(self)
(EngineCore_DP0 pid=5792)   File "/root/logs/vllm/vllm/model_executor/layers/quantization/fp8.py", line 1225, in get_fused_moe_quant_config
(EngineCore_DP0 pid=5792)     w1_scale=layer.w13_weight_scale,
(EngineCore_DP0 pid=5792)              ^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5792)   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1964, in __getattr__
(EngineCore_DP0 pid=5792)     raise AttributeError(
(EngineCore_DP0 pid=5792) AttributeError: 'FusedMoE' object has no attribute 'w13_weight_scale'. Did you mean: 'w13_weight_scale_inv'```

This issue was already present, but it was not detected as marlin was disabled. After moe refactor in vllm-project/vllm#31036, parameter self.use_marlin was replaced by self.fp8_backend. self.fp8_backend is disabled now

---------

Signed-off-by: Iryna Boiko <iboiko@habana.ai>
slokesha pushed a commit to libinta/vllm-gaudi that referenced this pull request Feb 9, 2026
1) Quick fix for upstream changes:
[PR30684](vllm-project/vllm#30684)
2) Fix for upstream changes:
vllm-project/vllm#28891 (Port:
[PR751](vllm-project#751))
3) Fix for vllm-project/vllm#31036
issue: failed test case run_qwen3_compressed_tensor_dynamic_scaling_test
```(EngineCore_DP0 pid=5792)   File "/root/logs/vllm/vllm/model_executor/layers/fused_moe/layer.py", line 1487, in ensure_moe_quant_config_init
(EngineCore_DP0 pid=5792)     self.quant_method.get_fused_moe_quant_config(self)
(EngineCore_DP0 pid=5792)   File "/root/logs/vllm/vllm/model_executor/layers/quantization/fp8.py", line 1225, in get_fused_moe_quant_config
(EngineCore_DP0 pid=5792)     w1_scale=layer.w13_weight_scale,
(EngineCore_DP0 pid=5792)              ^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5792)   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1964, in __getattr__
(EngineCore_DP0 pid=5792)     raise AttributeError(
(EngineCore_DP0 pid=5792) AttributeError: 'FusedMoE' object has no attribute 'w13_weight_scale'. Did you mean: 'w13_weight_scale_inv'```

This issue was already present, but it was not detected as marlin was disabled. After moe refactor in vllm-project/vllm#31036, parameter self.use_marlin was replaced by self.fp8_backend. self.fp8_backend is disabled now

---------

Signed-off-by: Iryna Boiko <iboiko@habana.ai>
Signed-off-by: slokesha <slokeshappa@habana.ai>
rajanintel24 pushed a commit to rajanintel24/vllm-gaudi that referenced this pull request Feb 11, 2026
1) Quick fix for upstream changes:
[PR30684](vllm-project/vllm#30684)
2) Fix for upstream changes:
vllm-project/vllm#28891 (Port:
[PR751](vllm-project#751))
3) Fix for vllm-project/vllm#31036
issue: failed test case run_qwen3_compressed_tensor_dynamic_scaling_test
```(EngineCore_DP0 pid=5792)   File "/root/logs/vllm/vllm/model_executor/layers/fused_moe/layer.py", line 1487, in ensure_moe_quant_config_init
(EngineCore_DP0 pid=5792)     self.quant_method.get_fused_moe_quant_config(self)
(EngineCore_DP0 pid=5792)   File "/root/logs/vllm/vllm/model_executor/layers/quantization/fp8.py", line 1225, in get_fused_moe_quant_config
(EngineCore_DP0 pid=5792)     w1_scale=layer.w13_weight_scale,
(EngineCore_DP0 pid=5792)              ^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5792)   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1964, in __getattr__
(EngineCore_DP0 pid=5792)     raise AttributeError(
(EngineCore_DP0 pid=5792) AttributeError: 'FusedMoE' object has no attribute 'w13_weight_scale'. Did you mean: 'w13_weight_scale_inv'```

This issue was already present, but it was not detected as marlin was disabled. After moe refactor in vllm-project/vllm#31036, parameter self.use_marlin was replaced by self.fp8_backend. self.fp8_backend is disabled now

---------

Signed-off-by: Iryna Boiko <iboiko@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants