[Bug] NVFP4 MoE kernels fail on RTX Blackwell (SM12.0) - device capability family check missing SM120

## Bug Description

vLLM v0.15.0 fails to run NVFP4 quantized MoE models on RTX Blackwell GPUs (compute capability 12.0, e.g., RTX PRO 6000 Blackwell Workstation Edition). The NVFP4 MoE backend selection code only checks for SM9.0 (Hopper) and SM10.x family (data center Blackwell B100/B200), but not SM12.0 (RTX Blackwell).

## Error Message

```
ValueError: NvFp4 MoE backend 'FLASHINFER_CUTLASS' does not support the 
deployment configuration since kernel does not support current device.
```

## Root Cause

The device capability checks in the NVFP4 MoE backend selection code use `is_device_capability_family(100)` which only matches SM10.x:

```python
# Current code - only checks family 100 (SM10.x)
current_platform.is_device_capability_family(100)
```

RTX Blackwell GPUs have compute capability 12.0 (SM120), which returns:
- `is_device_capability_family(100)` → **False** (120 // 10 = 12 ≠ 100 // 10 = 10)
- `has_device_capability(100)` → **True** (120 >= 100)

RTX Blackwell (SM12.0) shares the same native FP4/FP8 tensor core capabilities as data center Blackwell (SM10.0), so these kernels should work on both families.

## Affected Files

- `vllm/model_executor/layers/fused_moe/flashinfer_cutlass_moe.py` - `_supports_current_device()`, `_supports_quant_scheme()`
- `vllm/model_executor/layers/fused_moe/flashinfer_cutedsl_moe.py` - `_supports_current_device()`
- `vllm/model_executor/layers/fused_moe/flashinfer_trtllm_moe.py` - `_supports_current_device()`
- `vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py` - `_supports_current_device()`, `is_flashinfer_fp4_cutedsl_moe_available()`

## Proposed Fix

Add `is_device_capability_family(120)` checks alongside the existing `family(100)` checks:

```python
# Fix - check both SM10.x and SM12.x families
current_platform.is_device_capability_family(100) or current_platform.is_device_capability_family(120)
```

## Environment

- **GPU**: NVIDIA RTX PRO 6000 Blackwell Workstation Edition (SM12.0)
- **vLLM Version**: v0.15.0
- **Model**: MiniMax-M2.1-NVFP4
- **CUDA**: 13.0.2
- **Driver**: 580.126.09

## Steps to Reproduce

1. Run vLLM v0.15.0 on an RTX Blackwell GPU (SM12.0)
2. Load any NVFP4 quantized MoE model (e.g., MiniMax-M2.1-NVFP4)
3. Observe the error during model initialization

## Additional Context

This worked in v0.14.0 because the stricter device capability family checks were introduced in v0.15.0 (commit 42135d689).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] NVFP4 MoE kernels fail on RTX Blackwell (SM12.0) - device capability family check missing SM120 #33416

Bug Description

Error Message

Root Cause

Affected Files

Proposed Fix

Environment

Steps to Reproduce

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] NVFP4 MoE kernels fail on RTX Blackwell (SM12.0) - device capability family check missing SM120 #33416

Description

Bug Description

Error Message

Root Cause

Affected Files

Proposed Fix

Environment

Steps to Reproduce

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions