Sequence Parallel Decode Attn Kernel by ivanium · Pull Request #5 · ivanium/sglang

ivanium · 2024-08-01T08:36:00Z

This PR implements the SP decode kernel:

Initialize flashinfer wrapper with actual seq_lens.
The kernel support that replicates Q tensors of decoding batch across SP workers, gathers output o,s tensors at the end, and merges their states.
Fixed a bug in prefill communication which may lead to deadlock due to incorrect send/recv order
Incorporate KV cache store logic. Need out_cache_loc support here.

…p layout

…unicaiton

ivanium added 7 commits August 1, 2024 01:30

feat: sp decode kernel support

07c9a50

test; a simple test case to validate the algorithm

4188a98

chore [bench_latency]: enable decode tests

15b8c52

fix: fix batch size and num_qo_heads in decode attn kernel.

5fd21d4

fix [radix_attention]: fix comm dead lock issue in prefill

2929185

chore: update bench_latency to temporarily decode for only one iteration

f8222bc

chore: minor format tweak

87376b7

ivanium requested a review from ZYHowell August 1, 2024 23:30

ZYHowell and others added 4 commits August 2, 2024 22:52

fix sp len for decode

0296dbd

fix [infer_batch]: adjust decode positions and flashinfer kernel to s…

48fd03c

…p layout

fix [radix_attention]: use global rank for P2P communication

8b1470e

fix [bench_latency]: revert output_len changes

b930efa

ZYHowell reviewed Aug 4, 2024

View reviewed changes

Comment thread python/sglang/srt/layers/radix_attention.py

ivanium and others added 3 commits August 4, 2024 11:50

doc: add TODO comments that we should use all-to-all to optimize comm…

915b917

…unicaiton

bug fix and add doc

8b03b6b

fix decode token not begin at rank 0

4a1bb5e

ZYHowell merged commit 1695aed into main Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequence Parallel Decode Attn Kernel#5

Sequence Parallel Decode Attn Kernel#5
ZYHowell merged 14 commits intomainfrom
pr-sp-decode-kernel

ivanium commented Aug 1, 2024 •

edited by ZYHowell

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ivanium commented Aug 1, 2024 • edited by ZYHowell Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ivanium commented Aug 1, 2024 •

edited by ZYHowell

Loading