Skip to content

Sequence Parallel system setup#1

Merged
ivanium merged 19 commits intomainfrom
pr-sp-rope
Jul 22, 2024
Merged

Sequence Parallel system setup#1
ivanium merged 19 commits intomainfrom
pr-sp-rope

Conversation

@ZYHowell
Copy link
Copy Markdown
Collaborator

@ZYHowell ZYHowell commented Jul 20, 2024

This PR:

  • Add sp_size and sp_rank in model runner args.
  • Get the local sequence indices of each request for the prefill stage. (seems no longer needed after the SP layout...)
  • When preparing input_ids, it reorders them by [req_0_sp_0, req_1_sp_0, ..., req_n_sp_0, req_1_sp_1, ...]. In this way, when switching to the sequence parallel, there only needs an AllGather but no re-indexing.
  • Generate the corresponding position ids.
  • Fix LogitProcessorOutput for the above SP layout (simply a walkaround now. should avoid tensor transpose later).
  • Since we don't have real SP attention kernel in this PR, we actually reorder back to the original layout before the attention kernel and shift it back. So the performance is not high. This is only for correctness check and should be removed later.
  • (not tested) KV cache are stored following the above SP layout.

Generally, we have 3 layouts for tokens:

  1. Normal layout, which is [req_1, req_2, ...]
  2. Sequence Parallel layout no padding, which is [req_1_sp_0, req_2_sp_0, ... req_n_sp_0, req_1_sp_1, ...];
  3. Sequence Parallel layout, padded. When #tokens(req) % sp_size != 0, the first few sequence parallel ranks will have more tokens. To make operations like AllGather easy, other sequence parallel ranks are padded. This one looks like:
[
  req_1_sp_0, req_2_sp_0, ... req_n_sp_0,
  req_1_sp_1, req_2_sp_1, ... req_n_sp_1, padding_sp_1,
  req_1_sp_2, req_2_sp_2, ... req_n_sp_2, padding_sp_2,
  ...
]

Here we write it as if it's a 2D matrix, but it's actually 1-D. For each SP rank, padding is only added at the end (instead of at every req's end).

@ZYHowell ZYHowell changed the title [WIP] Sequence Parallel system setup Sequence Parallel system setup Jul 22, 2024
Copy link
Copy Markdown
Owner

@ivanium ivanium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ivanium
Copy link
Copy Markdown
Owner

ivanium commented Jul 22, 2024

I will go ahead and merge it first.

@ivanium ivanium merged commit a11bc61 into main Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants