Skip to content

[DP Attention] Optimize dp_padding_mode selection for dp_size=1 in extend mode#20406

Merged
ShangmingCai merged 4 commits intosgl-project:mainfrom
wangfakang:opt_DpPaddingMode
Mar 16, 2026
Merged

[DP Attention] Optimize dp_padding_mode selection for dp_size=1 in extend mode#20406
ShangmingCai merged 4 commits intosgl-project:mainfrom
wangfakang:opt_DpPaddingMode

Conversation

@wangfakang
Copy link
Copy Markdown
Contributor

@wangfakang wangfakang commented Mar 12, 2026

CC @yizhang2077 @ShangmingCai @nvcastet @ch-wan @merrymercy @Fridge003 PTAL, thx.

Motivation

When dp_size=1, the MAX_LEN and SUM_LEN modes have identical communication overhead since max_len equals sum_len. Previously, extend mode(get_dp_padding_mode) unconditionally used SUM_LEN, which prevented symmetric memory from being used (via disabled=True).

Now with dp_size=1, we prefer MAX_LEN mode to enable symmetric memory optimizations needed for NSA CP and other features.

def get_dp_padding_mode(
cls, is_extend_in_batch, global_num_tokens: List[int]
) -> DpPaddingMode:
if is_extend_in_batch:
return DpPaddingMode.SUM_LEN

def get_global_dp_buffer(cls) -> torch.Tensor:
with use_symmetric_memory(get_tp_group(), disabled=not cls._dp_max_padding):
buffer = torch.empty(
(cls._global_dp_buffer_len, cls._hidden_size),
dtype=cls._dtype,
device=cls._device,
)

def get_local_dp_buffer(cls) -> torch.Tensor:
with use_symmetric_memory(get_tp_group(), disabled=not cls._dp_max_padding):
buffer = torch.empty(
(cls._local_dp_buffer_len, cls._hidden_size),
dtype=cls._dtype,
device=cls._device,
)

Modifications

Update the logic of get_dp_padding_mode :

  • Only use SUM_LEN for extend mode when dp_size > 1.
  • Prefer MAX_LEN when communication cost is equal (>= instead of >).
  • This allows symmetric memory optimization for NSA CP and other use cases.

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

…tend mode

Signed-off-by: wangfakang <fakangwang@gmail.com>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@ShangmingCai
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@wangfakang
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

1 similar comment
@wangfakang
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

@wangfakang
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

2 similar comments
@wangfakang
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

@wangfakang
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

@wangfakang
Copy link
Copy Markdown
Contributor Author

/rerun_failed_ci

@ShangmingCai ShangmingCai merged commit 3d58cd1 into sgl-project:main Mar 16, 2026
256 of 277 checks passed
Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026
…tend mode (sgl-project#20406)

Signed-off-by: wangfakang <fakangwang@gmail.com>
0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026
…tend mode (sgl-project#20406)

Signed-off-by: wangfakang <fakangwang@gmail.com>
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
…tend mode (sgl-project#20406)

Signed-off-by: wangfakang <fakangwang@gmail.com>
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
…tend mode (sgl-project#20406)

Signed-off-by: wangfakang <fakangwang@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants