Sequence Parallel system setup by ZYHowell · Pull Request #1 · ivanium/sglang

ZYHowell · 2024-07-20T21:51:09Z

This PR:

Add sp_size and sp_rank in model runner args.
Get the local sequence indices of each request for the prefill stage. (seems no longer needed after the SP layout...)
When preparing input_ids, it reorders them by [req_0_sp_0, req_1_sp_0, ..., req_n_sp_0, req_1_sp_1, ...]. In this way, when switching to the sequence parallel, there only needs an AllGather but no re-indexing.
Generate the corresponding position ids.
Fix LogitProcessorOutput for the above SP layout (simply a walkaround now. should avoid tensor transpose later).
Since we don't have real SP attention kernel in this PR, we actually reorder back to the original layout before the attention kernel and shift it back. So the performance is not high. This is only for correctness check and should be removed later.
(not tested) KV cache are stored following the above SP layout.

Generally, we have 3 layouts for tokens:

Normal layout, which is [req_1, req_2, ...]
Sequence Parallel layout no padding, which is [req_1_sp_0, req_2_sp_0, ... req_n_sp_0, req_1_sp_1, ...];
Sequence Parallel layout, padded. When #tokens(req) % sp_size != 0, the first few sequence parallel ranks will have more tokens. To make operations like AllGather easy, other sequence parallel ranks are padded. This one looks like:

[
  req_1_sp_0, req_2_sp_0, ... req_n_sp_0,
  req_1_sp_1, req_2_sp_1, ... req_n_sp_1, padding_sp_1,
  req_1_sp_2, req_2_sp_2, ... req_n_sp_2, padding_sp_2,
  ...
]

Here we write it as if it's a 2D matrix, but it's actually 1-D. For each SP rank, padding is only added at the end (instead of at every req's end).

… local kv cache

ivanium

LGTM

ivanium · 2024-07-22T23:14:17Z

I will go ahead and merge it first.

ZYHowell added 17 commits July 19, 2024 13:27

add sp index

f498ad1

add clone for rope as it's in-place

4a807ec

add decode mask for sp

285348c

insert to prepare batch

5b2a048

add sp size and rank args

f8b8dbc

update sequence parallel layout

de61f42

minor bug fix to pass sp=1 test

073b9dc

give local indices to help with position ids; prepare for only record…

9599131

… local kv cache

minor fix

152666f

add sp layout to normal layout

50436b7

move sp layout transform tool to inputmetadata

8f8db37

add debug flatten to sp

fd49bf4

update name and doc string

1785ebf

bug fix for the sp=1 case

c8d850b

fix prefix lens None

8f46fee

fix debug mode indices

73af1b6

runnable but only first two decode tok cor

1afdae2

ZYHowell changed the title ~~[WIP] Sequence Parallel system setup~~ Sequence Parallel system setup Jul 22, 2024

ZYHowell added 2 commits July 22, 2024 14:50

fix early exit for decode with SP

2e41f46

format

dd2382d

ivanium approved these changes Jul 22, 2024

View reviewed changes

ivanium merged commit a11bc61 into main Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequence Parallel system setup#1

Sequence Parallel system setup#1
ivanium merged 19 commits intomainfrom
pr-sp-rope

ZYHowell commented Jul 20, 2024 •

edited

Loading

Uh oh!

ivanium left a comment

Uh oh!

ivanium commented Jul 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ZYHowell commented Jul 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ivanium left a comment

Choose a reason for hiding this comment

Uh oh!

ivanium commented Jul 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ZYHowell commented Jul 20, 2024 •

edited

Loading