[Feature, Hardware] Enable SGLang on AMD GPUs via PyTorch for ROCm by HaiShaw · Pull Request #1420 · sgl-project/sglang

HaiShaw · 2024-09-14T06:24:00Z

Motivation

Enable SGLang on AMD GPUs

Modifications

Bypass FlashInfer backend untill it is available on AMD/ROCm
Add proper fix for AMD FP8 e4m3fnuz to support Fused_MoE
Dependency over vLLM>=0.5.5, I modified pyproject.toml just to confirm that it works up to 0.6.0 as well.
Misc.
TODO: follow-up to address one error (below) when cuda-graph is enabled.

File "/sglang/python/sglang/srt/layers/sampler.py", line 164, in top_k_top_p_min_p_sampling_from_probs_torch
    min_p_thresholds = probs_sort[:, 0] * min_ps
    TypeError: unsupported operand type(s) for *: 'Tensor' and 'NoneType' (where min_ps is None)

How to run

An example on one MI3xx (not performance benchmark)

root@x:/sglang# VLLM_MOE_PADDING=0 python -m sglang.bench_latency --batch-size 32 --input 1024 --output 8 --model dummy_half_grok1/ --tokenizer-path Xenova/grok-1-tokenizer --load-format dummy --tp 8 --quant fp8  --attention-backend triton --sampling-backend  pytorch --disable-cuda-graph
Warmup ...
Prefill. latency: 25.79838 s, throughput:   1270.16 token/s
Decode.  latency: 0.53607 s, throughput:     59.69 token/s
Decode.  latency: 0.31325 s, throughput:    102.15 token/s
Decode.  latency: 0.04105 s, throughput:    779.47 token/s
Decode.  latency: 0.04075 s, throughput:    785.31 token/s
Decode.  median latency: 0.17715 s, median throughput:    180.64 token/s
Total. latency: 26.730 s, throughput:   1230.70 token/s
Benchmark ...
Prefill. latency: 1.11868 s, throughput:  29291.72 token/s
Decode.  latency: 0.02593 s, throughput:   1234.10 token/s
Decode.  latency: 0.02575 s, throughput:   1242.87 token/s
Decode.  latency: 0.02559 s, throughput:   1250.46 token/s
Decode.  latency: 0.02646 s, throughput:   1209.23 token/s
Decode.  latency: 0.02574 s, throughput:   1243.42 token/s
Decode.  median latency: 0.02574 s, median throughput:   1243.15 token/s
Total. latency:  1.325 s, throughput:  24925.60 token/s
root@x:/sglang#

Checklist

[+] Format your code according to the Contributor Guide.
[+] Add unit tests as outlined in the Contributor Guide.
[+] Update documentation as needed, including docstrings or example tutorials.

merrymercy

We deprecated --disable-flashinfer --disable-flashinfer-sampling. Please use --attention-backend triton --sampling-backend pytorch instead.
Alternatively, you can set these backends automatically here https://github.com/HaiShaw/sglang/blob/8715deff22727382ce6f74768213e3e72f413f71/python/sglang/srt/server_args.py#L155

if is_hip():
    self.attention_backend = "triton"
    self.sampling_backend = "pytorch"

The sampler + cuda graph issue has been fixed by #1392. You can rebase and try it again.
python -m sglang.bench_latency --model-path TinyLlama/TinyLlama-1.1B-Chat-v0.4 --attention-backend triton --sampling-backend pytorch This runs correctly.

linqingxu · 2024-10-16T02:57:51Z

rocm 6.1.2
Memory access fault by GPU node-1 (Agent handle: 0x5629173ccc50) on address 0x7fcbccfeb000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)
root@7acc50a881cf:/usr/local/lib/python3.10/dist-packages# /usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

merrymercy · 2024-10-19T14:46:34Z

@linqingxu try --disable-cuda-graph for now? We are working on more fixes.

…gl-project#1420)

Enable SGLang on AMD GPUs via PyTorch for ROCm (#1419)

8715def

HaiShaw changed the title ~~Enable SGLang on AMD GPUs via PyTorch for ROCm (#1419)~~ Enable SGLang on AMD GPUs via PyTorch for ROCm Sep 14, 2024

Ying1123 mentioned this pull request Sep 14, 2024

Development Roadmap (2024 Q3) #634

Closed

29 tasks

merrymercy requested changes Sep 14, 2024

View reviewed changes

HaiShaw added 2 commits September 15, 2024 00:25

Merge branch 'main' into main

c663008

Address review comments (#1419)

65cfe5f

merrymercy requested changes Sep 16, 2024

View reviewed changes

Comment thread python/sglang/srt/server.py Outdated

Comment thread python/sglang/srt/layers/layernorm.py

HaiShaw added 2 commits September 15, 2024 17:43

Merge branch 'sgl-project:main' into main

3e8019f

Address review comments (#1419)

c764ba5

HaiShaw requested a review from merrymercy September 16, 2024 05:21

Ying1123 changed the title ~~Enable SGLang on AMD GPUs via PyTorch for ROCm~~ [Feature, Hardware] Enable SGLang on AMD GPUs via PyTorch for ROCm Sep 16, 2024

Merge branch 'sgl-project:main' into main

a011a28

merrymercy reviewed Sep 17, 2024

View reviewed changes

Comment thread python/sglang/srt/layers/activation.py Outdated

HaiShaw added 2 commits September 16, 2024 22:37

Address review comments: move is_hip block

854db46

Merge branch 'main' into main

7ddb017

merrymercy approved these changes Sep 17, 2024

View reviewed changes

merrymercy enabled auto-merge (squash) September 17, 2024 07:31

merrymercy merged commit 3a6e041 into sgl-project:main Sep 17, 2024

zhyncs mentioned this pull request Sep 19, 2024

[Feature] Support AMD GPU via PyTorch for ROCm #1419

Closed

2 tasks

stbaione mentioned this pull request Dec 2, 2024

Add Baseline for SGLang Benchmark Test nod-ai/amd-shark-ai#602

Merged

timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025

[Feature, Hardware] Enable SGLang on AMD GPUs via PyTorch for ROCm (s…

8ba7d90

…gl-project#1420)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature, Hardware] Enable SGLang on AMD GPUs via PyTorch for ROCm#1420

[Feature, Hardware] Enable SGLang on AMD GPUs via PyTorch for ROCm#1420
merrymercy merged 8 commits intosgl-project:mainfrom
HaiShaw:main

HaiShaw commented Sep 14, 2024 •

edited

Loading

Uh oh!

merrymercy left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

linqingxu commented Oct 16, 2024

Uh oh!

merrymercy commented Oct 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

HaiShaw commented Sep 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

How to run

Checklist

Uh oh!

merrymercy left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

linqingxu commented Oct 16, 2024

Uh oh!

merrymercy commented Oct 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HaiShaw commented Sep 14, 2024 •

edited

Loading

merrymercy left a comment •

edited

Loading