Skip to content

[Bug] RTX 5090, the attention_backend is automatically set to 'trtllm_mha', but a ValueError is raised during SM version detection. #14814

@sufeng-buaa

Description

@sufeng-buaa

Checklist

  • I searched related issues but found no solution.
  • The bug persists in the latest version.
  • Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
  • If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
  • Please use English. Otherwise, it will be closed.

Describe the bug

I'm running qwen3-vl on an RTX 5090. If I don't explicitly specify the attention_backend parameter, SGLang automatically selects 'trtllm_mha'.

--- python/sglang/srt/server_args.py
@@ def _handle_attention_backend_compatibility(self):
            if not use_mla_backend:
                # MHA architecture
                if (
                    is_hopper_with_cuda_12_3()
                    and is_no_spec_infer_or_topk_one(self)
                    and is_fa3_default_architecture(self.model_config.hf_config)
                ):
                    self.attention_backend = "fa3"
                elif is_blackwell() and is_no_spec_infer_or_topk_one(self):
                    self.attention_backend = "trtllm_mha"   # !!! auto select trtllm_nha

However, later during the SM version check, a ValueError is raised.

--- python/sglang/srt/server_args.py
@@ def _handle_attention_backend_compatibility(self):
        if (
            self.attention_backend == "trtllm_mha"
            or self.decode_attention_backend == "trtllm_mha"
            or self.prefill_attention_backend == "trtllm_mha"
        ):
            if not is_sm100_supported():
                raise ValueError(
                    "TRTLLM MHA backend is only supported on Blackwell GPUs (SM100). Please use a different backend."
                )

I manually checked and confirmed the SM version is sm120. Does sm120 not support trtllm_mha? If that's the case, should a version check be added to prevent automatic selection of trtllm_mha in the first place?

Reproduction

python3 -m sglang.launch_server --model-path Qwen/Qwen3-VL-4B-Instruct-FP8 --enable-multimodal --cuda-graph-max-bs 128 --context-length 2560 --page-size 16 --stream-interval 300 --mem-fraction-static 0.7 --port 30260 --base-gpu-id 2 --kv-cache-dtype fp8_e4m3 --fp8-gemm-backend=cutlass

[2025-12-10 17:42:05] WARNING server_args.py:1391: Attention backend not explicitly specified. Use trtllm_mha backend by default.
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/mnt/sufeng/sglang/python/sglang/launch_server.py", line 25, in
server_args = prepare_server_args(sys.argv[1:])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/sufeng/sglang/python/sglang/srt/server_args.py", line 4463, in prepare_server_args
return ServerArgs.from_cli_args(raw_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/sufeng/sglang/python/sglang/srt/server_args.py", line 4012, in from_cli_args
return cls(**{attr: getattr(args, attr) for attr in attrs})
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 293, in init
File "/mnt/sufeng/sglang/python/sglang/srt/server_args.py", line 648, in post_init
self._handle_attention_backend_compatibility()
File "/mnt/sufeng/sglang/python/sglang/srt/server_args.py", line 1456, in _handle_attention_backend_compatibility
raise ValueError(
ValueError: TRTLLM MHA backend is only supported on Blackwell GPUs (SM100). Please use a different backend.

Environment

(root) root@iZbp1egv8tehlc78k9u6y7Z:/mnt/sufeng/sglang# nvidia-smi
Wed Dec 10 17:42:56 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.04 Driver Version: 570.124.04 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 5090 On | 00000000:08:00.0 Off | N/A |
| 0% 29C P8 7W / 575W | 1MiB / 32607MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 5090 On | 00000000:0C:00.0 Off | N/A |
| 0% 28C P8 21W / 575W | 1MiB / 32607MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA GeForce RTX 5090 On | 00000000:7E:00.0 Off | N/A |
| 0% 28C P8 17W / 575W | 1MiB / 32607MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA GeForce RTX 5090 On | 00000000:7F:00.0 Off | N/A |
| 0% 28C P8 11W / 575W | 1MiB / 32607MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA GeForce RTX 5090 On | 00000001:08:00.0 Off | N/A |
| 0% 27C P8 27W / 575W | 1MiB / 32607MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA GeForce RTX 5090 On | 00000001:0C:00.0 Off | N/A |
| 0% 27C P8 19W / 575W | 1MiB / 32607MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA GeForce RTX 5090 On | 00000001:81:00.0 Off | N/A |
| 0% 28C P8 13W / 575W | 1MiB / 32607MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA GeForce RTX 5090 On | 00000001:82:00.0 Off | N/A |
| 0% 27C P8 21W / 575W | 1MiB / 32607MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions