[Bug] After updating from 0.4.5.post2 to 0.4.5.post3, the following error is reported: AttributeError: '_OpNamespace' 'sgl_kernel' object has no attribute 'awq_dequantize'

### Checklist

- [ ] 1. I have searched related issues but cannot get the expected help.
- [ ] 2. The bug has not been fixed in the latest version.
- [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [ ] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [ ] 5. Please use English, otherwise it will be closed.

### Describe the bug

[2025-04-23 16:14:34 TP0] Attention backend not set. Use triton backend by default.
[2025-04-23 16:14:34 TP0] Init torch distributed begin.
[W423 16:14:34.122823038 HIPAllocatorConfig.h:29] Warning: expandable_segments not supported on this platform (function operator())
[2025-04-23 16:14:35 TP0] Init torch distributed ends. mem usage=0.00 GB
[2025-04-23 16:14:35 TP0] Load weight begin. avail mem=17.88 GB
[2025-04-23 16:14:35 TP0] sgl-kernel is not available on Non-NV platforms. Fallback to other kernel libraries.
[2025-04-23 16:14:35 TP0] sgl-kernel is not available on Non-NV platforms. Fallback to other kernel libraries.
[2025-04-23 16:14:35 TP0] The following error message 'operation scheduled before its operands' can be ignored.
/root/miniconda3/envs/xinf/lib/python3.10/site-packages/torch/utils/_device.py:104: UserWarning: expandable_segments not supported on this platform (Triggered internally at /pytorch/c10/hip/HIPAllocatorConfig.h:29.)
  return func(*args, **kwargs)
Loading safetensors checkpoint shards:   0% Completed | 0/3 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  33% Completed | 1/3 [00:04<00:09,  4.56s/it]
Loading safetensors checkpoint shards:  67% Completed | 2/3 [00:09<00:04,  4.66s/it]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:11<00:00,  3.46s/it]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:11<00:00,  3.78s/it]

[2025-04-23 16:14:47 TP0] Load weight end. type=Qwen2ForCausalLM, dtype=torch.float16, avail mem=7.96 GB, mem usage=9.92 GB.
[2025-04-23 16:14:47 TP0] KV Cache is allocated. #tokens: 13200, K size: 1.21 GB, V size: 1.21 GB
[2025-04-23 16:14:47 TP0] Memory pool end. avail mem=4.70 GB
[2025-04-23 16:14:47 TP0] Capture cuda graph begin. This can take up to several minutes. avail mem=4.70 GB
Capturing batches (avail_mem=4.70 GB):   0%|                                                      | 0/4 [00:00<?, ?it/s]
[2025-04-23 16:14:48 TP0] Scheduler hit an exception: Traceback (most recent call last):
  File "/usr/local/sglang/python/sglang/srt/managers/scheduler.py", line 2001, in run_scheduler_process
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
  File "/usr/local/sglang/python/sglang/srt/managers/scheduler.py", line 261, in __init__
    self.tp_worker = TpWorkerClass(
  File "/usr/local/sglang/python/sglang/srt/managers/tp_worker_overlap_thread.py", line 63, in __init__
    self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port)
  File "/usr/local/sglang/python/sglang/srt/managers/tp_worker.py", line 75, in __init__
    self.model_runner = ModelRunner(
  File "/usr/local/sglang/python/sglang/srt/model_executor/model_runner.py", line 181, in __init__
    self.initialize(min_per_gpu_memory)
  File "/usr/local/sglang/python/sglang/srt/model_executor/model_runner.py", line 219, in initialize
    self.init_cuda_graphs()
  File "/usr/local/sglang/python/sglang/srt/model_executor/model_runner.py", line 980, in init_cuda_graphs
    self.cuda_graph_runner = CudaGraphRunner(self)
  File "/usr/local/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 276, in __init__
    self.capture()
  File "/usr/local/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 360, in capture
    ) = self.capture_one_batch_size(bs, forward)
  File "/usr/local/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 452, in capture_one_batch_size
    run_once()
  File "/usr/local/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 445, in run_once
    logits_output = forward(input_ids, forward_batch.positions, forward_batch)
  File "/root/miniconda3/envs/xinf/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/sglang/python/sglang/srt/models/qwen2.py", line 383, in forward
    hidden_states = self.model(input_ids, positions, forward_batch, input_embeds)
  File "/root/miniconda3/envs/xinf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/xinf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/sglang/python/sglang/srt/models/qwen2.py", line 291, in forward
    hidden_states, residual = layer(
  File "/root/miniconda3/envs/xinf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/xinf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/sglang/python/sglang/srt/models/qwen2.py", line 224, in forward
    hidden_states = self.self_attn(
  File "/root/miniconda3/envs/xinf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/xinf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/sglang/python/sglang/srt/models/qwen2.py", line 167, in forward
    qkv, _ = self.qkv_proj(hidden_states)
  File "/root/miniconda3/envs/xinf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/xinf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/sglang/python/sglang/srt/layers/linear.py", line 445, in forward
    output_parallel = self.quant_method.apply(self, input_, bias)
  File "/usr/local/sglang/python/sglang/srt/layers/quantization/awq.py", line 195, in apply
    out = awq_dequantize(qweight, scales, qzeros)
  File "/root/miniconda3/envs/xinf/lib/python3.10/site-packages/sgl_kernel-0.0.9.post2-py3.10-linux-x86_64.egg/sgl_kernel/gemm.py", line 10, in awq_dequantize
    return torch.ops.sgl_kernel.awq_dequantize.default(qweight, scales, qzeros)
  File "/root/miniconda3/envs/xinf/lib/python3.10/site-packages/torch/_ops.py", line 1232, in __getattr__
    raise AttributeError(
AttributeError: '_OpNamespace' 'sgl_kernel' object has no attribute 'awq_dequantize'

### Reproduction

qwen2.5-instruct  14B int4 AWQ quantization

### Environment

Python: 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0]
ROCM available: True
GPU 0: AMD Radeon RX 7900 XT
GPU 0 Compute Capability: 11.0
ROCM_HOME: /opt/rocm
HIPCC: HIP version: 6.4.43482-0f2d60242
ROCM Driver Version:
PyTorch: 2.6.0+rocm6.4.0.git2fb0ac2b
sglang: 0.4.5.post3
sgl_kernel: 0.0.9.post2
flashinfer: Module Not Found
triton: 3.2.0
transformers: 4.51.1
torchao: 0.11.0.dev20250418+rocm6.3
numpy: 1.26.4
aiohttp: 3.11.14
fastapi: 0.115.12
hf_transfer: 0.1.9
huggingface_hub: 0.30.2
interegular: 0.3.3
modelscope: 1.24.0
orjson: 3.10.16
outlines: 0.1.11
packaging: 24.2
psutil: 7.0.0
pydantic: 2.10.6
multipart: 1.2.1
zmq: Module Not Found
uvicorn: 0.34.0
uvloop: 0.21.0
vllm: 0.8.5.dev134+gd6da9322c.d20250422.rocm640
xgrammar: 0.1.18
openai: 1.68.2
tiktoken: 0.9.0
anthropic: 0.49.0
litellm: 1.63.14
decord: 0.6.0
AMD Topology:

Hypervisor vendor: Microsoft
ulimit soft: 1024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] After updating from 0.4.5.post2 to 0.4.5.post3, the following error is reported: AttributeError: '_OpNamespace' 'sgl_kernel' object has no attribute 'awq_dequantize' #5668

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] After updating from 0.4.5.post2 to 0.4.5.post3, the following error is reported: AttributeError: '_OpNamespace' 'sgl_kernel' object has no attribute 'awq_dequantize' #5668

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions