Skip to content

Enable torch.compile for triton backend#1422

Merged
merrymercy merged 10 commits intomainfrom
triton-torch-compile
Sep 14, 2024
Merged

Enable torch.compile for triton backend#1422
merrymercy merged 10 commits intomainfrom
triton-torch-compile

Conversation

@merrymercy
Copy link
Copy Markdown
Contributor

@merrymercy merrymercy commented Sep 14, 2024

w/ torch.compile

python3 -m sglang.bench_latency --model meta-llama/Meta-Llama-3-8B --batch-size 1 --input 128 --output 8 --attention-backend triton --enable-torch-compile
Decode.  median latency: 0.00616 s, median throughput:    162.40 token/s

w/o torch.compile

python3 -m sglang.bench_latency --model meta-llama/Meta-Llama-3-8B --batch-size 1 --input 128 --output 8 --attention-backend triton
Decode.  median latency: 0.00704 s, median throughput:    142.05 token/s

@hnyls2002
Copy link
Copy Markdown
Collaborator

w/ torch.compile

python3 -m sglang.bench_latency --model meta-llama/Meta-Llama-3-8B --batch-size 1 --input 128 --output 8 --attention-backend triton --enable-torch-compile
Decode.  median latency: 0.00616 s, median throughput:    162.40 token/s

w/o torch.compile

python3 -m sglang.bench_latency --model meta-llama/Meta-Llama-3-8B --batch-size 1 --input 128 --output 8 --attention-backend triton
Decode.  median latency: 0.00704 s, median throughput:    142.05 token/s

The result is so impressive

Comment thread python/sglang/srt/layers/attention_backend.py
@merrymercy merrymercy merged commit 9463bc1 into main Sep 14, 2024
@merrymercy merrymercy deleted the triton-torch-compile branch September 14, 2024 22:38
@merrymercy merrymercy restored the triton-torch-compile branch September 15, 2024 03:07
@merrymercy merrymercy deleted the triton-torch-compile branch September 15, 2024 14:28
@merrymercy merrymercy mentioned this pull request Sep 19, 2024
3 tasks
timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants