[Bug] RTX 5090, the attention_backend is automatically set to 'trtllm_mha', but a ValueError is raised during SM version detection.

### Checklist

- [x] I searched related issues but found no solution.
- [x] The bug persists in the latest version.
- [ ] Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
- [ ] If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- [ ] Please use English. Otherwise, it will be closed.

### Describe the bug

I'm running qwen3-vl on an RTX 5090. If I don't explicitly specify the attention_backend parameter, SGLang automatically selects 'trtllm_mha'. 
```python
--- python/sglang/srt/server_args.py
@@ def _handle_attention_backend_compatibility(self):
            if not use_mla_backend:
                # MHA architecture
                if (
                    is_hopper_with_cuda_12_3()
                    and is_no_spec_infer_or_topk_one(self)
                    and is_fa3_default_architecture(self.model_config.hf_config)
                ):
                    self.attention_backend = "fa3"
                elif is_blackwell() and is_no_spec_infer_or_topk_one(self):
                    self.attention_backend = "trtllm_mha"   # !!! auto select trtllm_nha
```

However, later during the SM version check, a ValueError is raised. 
```python
--- python/sglang/srt/server_args.py
@@ def _handle_attention_backend_compatibility(self):
        if (
            self.attention_backend == "trtllm_mha"
            or self.decode_attention_backend == "trtllm_mha"
            or self.prefill_attention_backend == "trtllm_mha"
        ):
            if not is_sm100_supported():
                raise ValueError(
                    "TRTLLM MHA backend is only supported on Blackwell GPUs (SM100). Please use a different backend."
                )
```
I manually checked and confirmed the SM version is sm120. Does sm120 not support trtllm_mha? If that's the case, should a version check be added to prevent automatic selection of trtllm_mha in the first place?

### Reproduction

```bash
python3 -m sglang.launch_server --model-path Qwen/Qwen3-VL-4B-Instruct-FP8 --enable-multimodal --cuda-graph-max-bs 128 --context-length 2560 --page-size 16 --stream-interval 300 --mem-fraction-static 0.7 --port 30260 --base-gpu-id 2 --kv-cache-dtype fp8_e4m3 --fp8-gemm-backend=cutlass
```

[2025-12-10 17:42:05] WARNING server_args.py:1391: Attention backend not explicitly specified. Use trtllm_mha backend by default.
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/mnt/sufeng/sglang/python/sglang/launch_server.py", line 25, in <module>
    server_args = prepare_server_args(sys.argv[1:])
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/sufeng/sglang/python/sglang/srt/server_args.py", line 4463, in prepare_server_args
    return ServerArgs.from_cli_args(raw_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/sufeng/sglang/python/sglang/srt/server_args.py", line 4012, in from_cli_args
    return cls(**{attr: getattr(args, attr) for attr in attrs})
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 293, in __init__
  File "/mnt/sufeng/sglang/python/sglang/srt/server_args.py", line 648, in __post_init__
    self._handle_attention_backend_compatibility()
  File "/mnt/sufeng/sglang/python/sglang/srt/server_args.py", line 1456, in _handle_attention_backend_compatibility
    raise ValueError(
ValueError: TRTLLM MHA backend is only supported on Blackwell GPUs (SM100). Please use a different backend.

### Environment

(root) root@iZbp1egv8tehlc78k9u6y7Z:/mnt/sufeng/sglang# nvidia-smi
Wed Dec 10 17:42:56 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.04             Driver Version: 570.124.04     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 5090        On  |   00000000:08:00.0 Off |                  N/A |
|  0%   29C    P8              7W /  575W |       1MiB /  32607MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 5090        On  |   00000000:0C:00.0 Off |                  N/A |
|  0%   28C    P8             21W /  575W |       1MiB /  32607MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA GeForce RTX 5090        On  |   00000000:7E:00.0 Off |                  N/A |
|  0%   28C    P8             17W /  575W |       1MiB /  32607MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA GeForce RTX 5090        On  |   00000000:7F:00.0 Off |                  N/A |
|  0%   28C    P8             11W /  575W |       1MiB /  32607MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA GeForce RTX 5090        On  |   00000001:08:00.0 Off |                  N/A |
|  0%   27C    P8             27W /  575W |       1MiB /  32607MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA GeForce RTX 5090        On  |   00000001:0C:00.0 Off |                  N/A |
|  0%   27C    P8             19W /  575W |       1MiB /  32607MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA GeForce RTX 5090        On  |   00000001:81:00.0 Off |                  N/A |
|  0%   28C    P8             13W /  575W |       1MiB /  32607MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA GeForce RTX 5090        On  |   00000001:82:00.0 Off |                  N/A |
|  0%   27C    P8             21W /  575W |       1MiB /  32607MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] RTX 5090, the attention_backend is automatically set to 'trtllm_mha', but a ValueError is raised during SM version detection. #14814

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] RTX 5090, the attention_backend is automatically set to 'trtllm_mha', but a ValueError is raised during SM version detection. #14814

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions