Skip to content

[PieceWise CUDA Graph] Support awq/gptq model in piecewise cudagraph#12518

Merged
ispobock merged 12 commits intomainfrom
try_to_fix_piecewise_cuda_graph_awq_model
Nov 11, 2025
Merged

[PieceWise CUDA Graph] Support awq/gptq model in piecewise cudagraph#12518
ispobock merged 12 commits intomainfrom
try_to_fix_piecewise_cuda_graph_awq_model

Conversation

@BBuf
Copy link
Copy Markdown
Collaborator

@BBuf BBuf commented Nov 2, 2025

python3 -m sglang.launch_server --model-path Qwen/QwQ-32B-AWQ --tp 4 --host 0.0.0.0 --enable-piecewise-cuda-graph

100%|███████████████████████████████████████████████████████████████████████| 1319/1319 [00:45<00:00, 28.97it/s]
Accuracy: 0.680
Invalid: 0.000
Latency: 45.628 s
Output throughput: 9831.561 token/s

python3 -m sglang.launch_server --model-path Qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int4 --tp 4 --host 0.0.0.0 --enable-piecewise-cuda-graph

100%|██████████████████████████████████████████████████████████████████████| 1319/1319 [00:09<00:00, 135.12it/s]
Accuracy: 0.733
Invalid: 0.005
Latency: 9.817 s
Output throughput: 17887.372 token/s

@BBuf BBuf changed the title try to support awq/gptq model in piecewise cudagraph [PieceWise CUDA Graph] Support awq/gptq model in piecewise cudagraph Nov 3, 2025
@BBuf BBuf marked this pull request as ready for review November 3, 2025 03:31
Copy link
Copy Markdown
Collaborator

@Oasis-Git Oasis-Git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except some minor modification

Comment thread python/sglang/srt/layers/quantization/marlin_utils.py Outdated
@FlamingoPg FlamingoPg self-assigned this Nov 6, 2025
@FlamingoPg
Copy link
Copy Markdown
Collaborator

@BBuf I see there are some fake registers in the code. Is this for torch compile?

@BBuf
Copy link
Copy Markdown
Collaborator Author

BBuf commented Nov 6, 2025

@BBuf I see there are some fake registers in the code. Is this for torch compile?

Right: register the quantized kernels for piecewise CUDA Graph compilation.

@ispobock ispobock requested a review from Fridge003 as a code owner November 9, 2025 12:26
Comment thread test/srt/test_piecewise_cuda_graph.py
BBuf and others added 3 commits November 10, 2025 16:07
- Resolved conflicts in piecewise_cuda_graph_runner.py by combining:
  - Main branch's enable_piecewise_cuda_graph() context and init_forward_metadata() call
  - Current branch's quant_config parameter support for AWQ/GPTQ models

- Resolved conflicts in test_piecewise_cuda_graph.py by keeping:
  - Main branch's TestPiecewiseCudaGraphDeepSeek test
  - Current branch's TestPiecewiseCudaGraphAWQ and TestPiecewiseCudaGraphGPTQ tests
@ispobock ispobock merged commit 9caca6a into main Nov 11, 2025
117 of 126 checks passed
@ispobock ispobock deleted the try_to_fix_piecewise_cuda_graph_awq_model branch November 11, 2025 03:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants