[Feature] two-chunk overlap for DeepSeekV3/R1

### Checklist

- [x] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 2. Please use English, otherwise it will be closed.

### Motivation

In actual  cases, the request sequence lengths are not uniform  in the prefill node (4 nodes, tp1, dp32) . Consider the following two cases:
* Case 1:
each dp has extend_seq_len = [100, 2000], extend_prefix_len = [0, 0].
If we split it into [100] and [2000], the computation and communication times of the two micro-batches will differ significantly,  impacting overlap efficiency.
﻿
* Case 2:
  * dp0:           batch_size = 1, extend_seq_len = [4000], extend_prefix_len = [0, 0].
  * dp1-dp31 : batch_size = 2, extend_seq_len = [2000, 2000], extend_prefix_len = [0, 0].
In this scenario: either overlap cannot be enabled, or dp0 must construct an idle batch to enable overlap.
In both, the full potential of overlap cannot be achieved.
﻿

To address these issues, we propose two-chunk overlap, which also belongs to the scope of two-batch overlap.
* Case 1:
we split it into: 
  * extend_seq_len0 = [100, 950], extend_prefix_len0 = [0, 0]
  * extend_seq_len1 = [1050], extend_prefix_len1 = [950]

* Case2:
we split it into:
  * extend_seq_len0=[2000], extend_prefix_len0=[0]
  * extend_seq_len1=[2000], extend_prefix_len1=[2000]

In two-chunk overlap, there is a latent dependency between two micro-batches in mla computation.

![Image](https://github.com/user-attachments/assets/576fd6e0-3e5c-4940-b650-41c7e75f7365)

We implemented two-chunk overlap in early version of sglang, which differs significantly from the current version. 

By the way, we support idle batch in two-chunk overlap, which works when some dps are idle. Such as the following case：
  * dp0：idle batch
  * dp1-dp31: extend_seq_len=[2000]


### Related resources

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] two-chunk overlap for DeepSeekV3/R1 #6328

Checklist

Motivation

Related resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] two-chunk overlap for DeepSeekV3/R1 #6328

Description

Checklist

Motivation

Related resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions