Skip to content

[Bug] After updating from 0.4.5.post2 to 0.4.5.post3, the following error is reported: AttributeError: '_OpNamespace' 'sgl_kernel' object has no attribute 'awq_dequantize' #5668

@githust66

Description

@githust66

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

[2025-04-23 16:14:34 TP0] Attention backend not set. Use triton backend by default.
[2025-04-23 16:14:34 TP0] Init torch distributed begin.
[W423 16:14:34.122823038 HIPAllocatorConfig.h:29] Warning: expandable_segments not supported on this platform (function operator())
[2025-04-23 16:14:35 TP0] Init torch distributed ends. mem usage=0.00 GB
[2025-04-23 16:14:35 TP0] Load weight begin. avail mem=17.88 GB
[2025-04-23 16:14:35 TP0] sgl-kernel is not available on Non-NV platforms. Fallback to other kernel libraries.
[2025-04-23 16:14:35 TP0] sgl-kernel is not available on Non-NV platforms. Fallback to other kernel libraries.
[2025-04-23 16:14:35 TP0] The following error message 'operation scheduled before its operands' can be ignored.
/root/miniconda3/envs/xinf/lib/python3.10/site-packages/torch/utils/_device.py:104: UserWarning: expandable_segments not supported on this platform (Triggered internally at /pytorch/c10/hip/HIPAllocatorConfig.h:29.)
return func(*args, **kwargs)
Loading safetensors checkpoint shards: 0% Completed | 0/3 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 33% Completed | 1/3 [00:04<00:09, 4.56s/it]
Loading safetensors checkpoint shards: 67% Completed | 2/3 [00:09<00:04, 4.66s/it]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:11<00:00, 3.46s/it]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:11<00:00, 3.78s/it]

[2025-04-23 16:14:47 TP0] Load weight end. type=Qwen2ForCausalLM, dtype=torch.float16, avail mem=7.96 GB, mem usage=9.92 GB.
[2025-04-23 16:14:47 TP0] KV Cache is allocated. #tokens: 13200, K size: 1.21 GB, V size: 1.21 GB
[2025-04-23 16:14:47 TP0] Memory pool end. avail mem=4.70 GB
[2025-04-23 16:14:47 TP0] Capture cuda graph begin. This can take up to several minutes. avail mem=4.70 GB
Capturing batches (avail_mem=4.70 GB): 0%| | 0/4 [00:00<?, ?it/s]
[2025-04-23 16:14:48 TP0] Scheduler hit an exception: Traceback (most recent call last):
File "/usr/local/sglang/python/sglang/srt/managers/scheduler.py", line 2001, in run_scheduler_process
scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
File "/usr/local/sglang/python/sglang/srt/managers/scheduler.py", line 261, in init
self.tp_worker = TpWorkerClass(
File "/usr/local/sglang/python/sglang/srt/managers/tp_worker_overlap_thread.py", line 63, in init
self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port)
File "/usr/local/sglang/python/sglang/srt/managers/tp_worker.py", line 75, in init
self.model_runner = ModelRunner(
File "/usr/local/sglang/python/sglang/srt/model_executor/model_runner.py", line 181, in init
self.initialize(min_per_gpu_memory)
File "/usr/local/sglang/python/sglang/srt/model_executor/model_runner.py", line 219, in initialize
self.init_cuda_graphs()
File "/usr/local/sglang/python/sglang/srt/model_executor/model_runner.py", line 980, in init_cuda_graphs
self.cuda_graph_runner = CudaGraphRunner(self)
File "/usr/local/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 276, in init
self.capture()
File "/usr/local/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 360, in capture
) = self.capture_one_batch_size(bs, forward)
File "/usr/local/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 452, in capture_one_batch_size
run_once()
File "/usr/local/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 445, in run_once
logits_output = forward(input_ids, forward_batch.positions, forward_batch)
File "/root/miniconda3/envs/xinf/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/sglang/python/sglang/srt/models/qwen2.py", line 383, in forward
hidden_states = self.model(input_ids, positions, forward_batch, input_embeds)
File "/root/miniconda3/envs/xinf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/xinf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/sglang/python/sglang/srt/models/qwen2.py", line 291, in forward
hidden_states, residual = layer(
File "/root/miniconda3/envs/xinf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/xinf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/sglang/python/sglang/srt/models/qwen2.py", line 224, in forward
hidden_states = self.self_attn(
File "/root/miniconda3/envs/xinf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/xinf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/sglang/python/sglang/srt/models/qwen2.py", line 167, in forward
qkv, _ = self.qkv_proj(hidden_states)
File "/root/miniconda3/envs/xinf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/xinf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in call_impl
return forward_call(*args, **kwargs)
File "/usr/local/sglang/python/sglang/srt/layers/linear.py", line 445, in forward
output_parallel = self.quant_method.apply(self, input
, bias)
File "/usr/local/sglang/python/sglang/srt/layers/quantization/awq.py", line 195, in apply
out = awq_dequantize(qweight, scales, qzeros)
File "/root/miniconda3/envs/xinf/lib/python3.10/site-packages/sgl_kernel-0.0.9.post2-py3.10-linux-x86_64.egg/sgl_kernel/gemm.py", line 10, in awq_dequantize
return torch.ops.sgl_kernel.awq_dequantize.default(qweight, scales, qzeros)
File "/root/miniconda3/envs/xinf/lib/python3.10/site-packages/torch/_ops.py", line 1232, in getattr
raise AttributeError(
AttributeError: '_OpNamespace' 'sgl_kernel' object has no attribute 'awq_dequantize'

Reproduction

qwen2.5-instruct 14B int4 AWQ quantization

Environment

Python: 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0]
ROCM available: True
GPU 0: AMD Radeon RX 7900 XT
GPU 0 Compute Capability: 11.0
ROCM_HOME: /opt/rocm
HIPCC: HIP version: 6.4.43482-0f2d60242
ROCM Driver Version:
PyTorch: 2.6.0+rocm6.4.0.git2fb0ac2b
sglang: 0.4.5.post3
sgl_kernel: 0.0.9.post2
flashinfer: Module Not Found
triton: 3.2.0
transformers: 4.51.1
torchao: 0.11.0.dev20250418+rocm6.3
numpy: 1.26.4
aiohttp: 3.11.14
fastapi: 0.115.12
hf_transfer: 0.1.9
huggingface_hub: 0.30.2
interegular: 0.3.3
modelscope: 1.24.0
orjson: 3.10.16
outlines: 0.1.11
packaging: 24.2
psutil: 7.0.0
pydantic: 2.10.6
multipart: 1.2.1
zmq: Module Not Found
uvicorn: 0.34.0
uvloop: 0.21.0
vllm: 0.8.5.dev134+gd6da9322c.d20250422.rocm640
xgrammar: 0.1.18
openai: 1.68.2
tiktoken: 0.9.0
anthropic: 0.49.0
litellm: 1.63.14
decord: 0.6.0
AMD Topology:

Hypervisor vendor: Microsoft
ulimit soft: 1024

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions