[Feature] Add initial support for sequence parallelism by Ying1123 · Pull Request #1436 · sgl-project/sglang

Ying1123 · 2024-09-16T08:24:06Z

kuangdao · 2024-11-28T08:09:51Z

From the code i see, the prefill stage after attention, the shape of output is [padded_total_num_tokens, q_head_num // SP_SIZE, head_dim], and then out * RowSeqParallelLinear which need use allreduce. the input of qkv_proj_linear is [padded_total_num_tokens, q_head_num, head_dim] which not spilted by sp_size. i want to know why done use ring attention , ring attention seems better then it in both Computing and Communication.

Edenzzzz · 2024-12-03T22:05:53Z

+        For each SP worker, we have either (1) QKV of entire sequences:
+            q tensor: [padded_total_num_tokens, q_head_num // SP_SIZE, head_dim]
+            k tensor: [padded_total_num_tokens, k_head_num, head_dim]
+            v tensor: [padded_total_num_tokens, v_head_num, head_dim]
+        Or (2) Q of entire sequences and KV of the current SP shard:
+            q tensor: [padded_total_num_tokens, q_head_num // SP_SIZE, head_dim]
+            k tensor: [padded_sp_shard_num_tokens, k_head_num, head_dim]
+            v tensor: [padded_sp_shard_num_tokens, v_head_num, head_dim]
+
+        Case (1) saves cross-SP-worker communication, while case (2) saves computation
+        to get K and V for entire sequences but need computation in SP attn.
+        """


(2) seems to be able to split workload and overlap even with single query. But just curious, does anyone have opinions on TreeAttention (just all-reduce lse instead of sending KV), which seems optimized for decoding?

Edenzzzz · 2024-12-03T22:35:02Z

+        # TODO: in fact we can use all-to-all to gather the output and state here
+        # to collect only q head shards that are needed by the current SP worker.
+        # All-to-all will save communication and `merge_state` computation.


Later all-reduce in ColumnSeqParallelLinear ?

fangtaosong · 2024-12-31T04:27:45Z

Could this feature be integrated in the early months of 2025, and by the way, why not use the ring-attention which offers better performance? @merrymercy @Ying1123

Edenzzzz · 2025-01-01T06:11:21Z

Could this feature be integrated in the early months of 2025, and by the way, why not use the ring-attention which offers better performance? @merrymercy @Ying1123

This PR already implements ring attention.

fangtaosong · 2025-01-20T06:31:55Z

Could this feature be integrated in the early months of 2025, and by the way, why not use the ring-attention which offers better performance? @merrymercy @Ying1123

This PR already implements ring attention.

However, the code heavily relies on tensor-parallelism, and its layout [s, h//sp, d] seems to be inequivalent to flash-linear-attention in xDit [s//sp, h, d] in both computation and memory access.

HJSang · 2025-01-31T22:18:02Z

Is this still active? Looking forward for this change. It will be super helpful if we really want to handle long context.

Edenzzzz · 2025-02-05T23:05:03Z

However, the code heavily relies on tensor-parallelism, and its layout [s, h//sp, d] seems to be inequivalent to flash-linear-attention in xDit [s//sp, h, d] in both computation and memory access.

The reason for coupling with TP is probably that if we use pure ring attn, we can only replicate Q on all ranks, which causes redundant computation, while if we shard Q there's no redundancy.
I think this could be solved with Tree attention (replicate Q, all-reduce lse)

Edenzzzz · 2025-03-04T00:56:54Z

I'm interested in helping with finalizing this if possible :)

add hybrid kv

71c8afe

Ying1123 force-pushed the seq-parallel branch from c263cb3 to 71c8afe Compare September 16, 2024 08:42

Ying1123 changed the title ~~Add initial support for sequence parallelism~~ [Feature] Add initial support for sequence parallelism Sep 16, 2024

merrymercy mentioned this pull request Sep 17, 2024

Sequence Parallel #1041

Closed

3 tasks

Ying1123 marked this pull request as draft September 19, 2024 01:39

merrymercy mentioned this pull request Sep 22, 2024

Development Roadmap (2024 Q4) #1487

Closed

37 tasks

merrymercy force-pushed the main branch from 55311eb to 2134f08 Compare November 2, 2024 01:26

merrymercy assigned Ying1123 Nov 9, 2024

merrymercy added the await-response label Dec 1, 2024

Edenzzzz reviewed Dec 3, 2024

View reviewed changes

merrymercy force-pushed the main branch from 1ad76cd to 835f8af Compare December 9, 2024 07:31

zhaochenyang20 mentioned this pull request Mar 3, 2025

Development Roadmap (2025 H1) #4035

Closed

22 tasks

zhyncs mentioned this pull request Mar 4, 2025

Development Roadmap (2025 H1) #4042

Closed

67 tasks

merrymercy closed this Apr 21, 2025

zhyncs deleted the seq-parallel branch July 22, 2025 05:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add initial support for sequence parallelism#1436

[Feature] Add initial support for sequence parallelism#1436
Ying1123 wants to merge 1 commit intomainfrom
seq-parallel

Ying1123 commented Sep 16, 2024

Uh oh!

kuangdao commented Nov 28, 2024

Uh oh!

Edenzzzz Dec 3, 2024 •

edited

Loading

Uh oh!

Edenzzzz Dec 3, 2024 •

edited

Loading

Uh oh!

fangtaosong commented Dec 31, 2024 •

edited

Loading

Uh oh!

Edenzzzz commented Jan 1, 2025

Uh oh!

fangtaosong commented Jan 20, 2025

Uh oh!

HJSang commented Jan 31, 2025

Uh oh!

Edenzzzz commented Feb 5, 2025 •

edited

Loading

Uh oh!

Edenzzzz commented Mar 4, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

Ying1123 commented Sep 16, 2024

Uh oh!

kuangdao commented Nov 28, 2024

Uh oh!

Edenzzzz Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Edenzzzz Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fangtaosong commented Dec 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Edenzzzz commented Jan 1, 2025

Uh oh!

fangtaosong commented Jan 20, 2025

Uh oh!

HJSang commented Jan 31, 2025

Uh oh!

Edenzzzz commented Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Edenzzzz commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Edenzzzz Dec 3, 2024 •

edited

Loading

Edenzzzz Dec 3, 2024 •

edited

Loading

fangtaosong commented Dec 31, 2024 •

edited

Loading

Edenzzzz commented Feb 5, 2025 •

edited

Loading

Edenzzzz commented Mar 4, 2025 •

edited

Loading