[Bug] Error loading Qwen/Qwen3-30B-A3B-GPTQ-Int4 model

### Checklist

- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 5. Please use English, otherwise it will be closed.

### Describe the bug

I'm using the docker image `lmsysorg/sglang:v0.4.8-cu126` 

Ran into this issue:

```
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 14, in <module>
    launch_server(server_args)
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/http_server.py", line 834, in launch_server
    tokenizer_manager, template_manager, scheduler_info = _launch_subprocesses(
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 754, in _launch_subprocesses
    tokenizer_manager = TokenizerManager(server_args, port_args)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/tokenizer_manager.py", line 196, in __init__
    self.model_config = ModelConfig.from_server_args(server_args)
  File "/sgl-workspace/sglang/python/sglang/srt/configs/model_config.py", line 257, in from_server_args
    return ModelConfig(
  File "/sgl-workspace/sglang/python/sglang/srt/configs/model_config.py", line 243, in __init__
    self._verify_quantization()
  File "/sgl-workspace/sglang/python/sglang/srt/configs/model_config.py", line 395, in _verify_quantization
    quantization_override = method.override_quantization_method(
  File "/sgl-workspace/sglang/python/sglang/srt/layers/quantization/gptq.py", line 284, in override_quantization_method
    can_convert = cls.is_gptq_marlin_compatible(hf_quant_cfg)
  File "/sgl-workspace/sglang/python/sglang/srt/layers/quantization/gptq.py", line 347, in is_gptq_marlin_compatible
    return check_marlin_supported(
NameError: name 'check_marlin_supported' is not defined. Did you mean: 'check_marlin_format'?
```

### Reproduction

Start sglang docker image with the following command

```
command: >
        python3 -m sglang.launch_server
      --model-path Qwen/Qwen3-30B-A3B-GPTQ-Int4
      --served-model-name qwen-3-30b-a3b
      --sleep-on-idle
      --json-model-override-args '{"rope_scaling":{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}}'
      --context-length 131072
      --chat-template /root/templates/qwen3_nonthinking.jinja
      --host 0.0.0.0
      --enable-p2p-check
      --tensor-parallel-size 2
      --port 80
      --tool-call-parser qwen25
```

### Environment

Docker image `lmsysorg/sglang:v0.4.8-cu126`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Error loading Qwen/Qwen3-30B-A3B-GPTQ-Int4 model #7583

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Error loading Qwen/Qwen3-30B-A3B-GPTQ-Int4 model #7583

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions