Skip to content

Add support for more batch sizes in cpu_graph_runner#13881

Merged
Kangyan-Zhou merged 38 commits intosgl-project:mainfrom
CaoE:compile_padding2
Mar 19, 2026
Merged

Add support for more batch sizes in cpu_graph_runner#13881
Kangyan-Zhou merged 38 commits intosgl-project:mainfrom
CaoE:compile_padding2

Conversation

@CaoE
Copy link
Copy Markdown
Contributor

@CaoE CaoE commented Nov 25, 2025

Motivation

Add support for more batch sizes in cpu_graph_runner to reduce python overhead and achieve higher performance.

Modifications

  • Add replay_prepare to pad the input in order to utilize the compiled graph.
  • Change the strategy for capture_bs, adding more support for capture_bs.
  • Reuse --cuda-graph-bs to allow users to customize the graph batch size on CPU.

Checklist

@github-actions github-actions Bot added documentation Improvements or additions to documentation deepseek labels Nov 25, 2025
@CaoE CaoE force-pushed the compile_padding2 branch 2 times, most recently from c1692a4 to 64ad61e Compare November 25, 2025 03:35
@CaoE
Copy link
Copy Markdown
Contributor Author

CaoE commented Nov 25, 2025

Hi @Alcanderian @FlamingoPg @zhyncs Currently, sglang uses parameters with a "cuda" prefix for graph capture, such as --cuda-graph-bs. It is hard to use parameters like --cuda-graph-bs for CPU usage because it's confusing, and similar issues may arise in the future if other device types, such as XPU, also begin to support graph mode. Will sglang consider using parameters, e.g., device-graph-bs, that are applicable to multiple device types? Or, if necessary, add device-specific parameters for other devices, such as cpu-graph-bs? In this draft PR, I added cpu-graph-bs to raise this question as a reference. Thank you very much if you could give me some suggestions. cc @mingfeima

@CaoE CaoE changed the title Add support for more batch sizes in torch.compile on the CPU Add support for more batch sizes in torch.compile on cpu_graph_runner Dec 4, 2025
@CaoE CaoE changed the title Add support for more batch sizes in torch.compile on cpu_graph_runner Add support for more batch sizes in cpu_graph_runner Dec 4, 2025
@CaoE CaoE marked this pull request as ready for review December 26, 2025 07:12
@CaoE
Copy link
Copy Markdown
Contributor Author

CaoE commented Dec 26, 2025

/tag-run-ci-label

@CaoE
Copy link
Copy Markdown
Contributor Author

CaoE commented Dec 26, 2025

@Alcanderian @zhyncs Could you please help review this PR ? Thanks.

@CaoE
Copy link
Copy Markdown
Contributor Author

CaoE commented Jan 4, 2026

@FlamingoPg @Alcanderian Could you please review this PR ? Thank you.

@CaoE
Copy link
Copy Markdown
Contributor Author

CaoE commented Jan 7, 2026

/rerun-failed-ci

1 similar comment
@CaoE
Copy link
Copy Markdown
Contributor Author

CaoE commented Jan 7, 2026

/rerun-failed-ci

@mingfeima
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

jianan-gu added a commit to jianan-gu/sglang that referenced this pull request Feb 27, 2026
@yeahdongcn yeahdongcn mentioned this pull request Mar 7, 2026
5 tasks
@ZailiWang
Copy link
Copy Markdown
Contributor

/rerun-failed-ci

@CaoE
Copy link
Copy Markdown
Contributor Author

CaoE commented Mar 19, 2026

/rerun-failed-ci

@Kangyan-Zhou Kangyan-Zhou merged commit 274581f into sgl-project:main Mar 19, 2026
109 of 146 checks passed
Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026
0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026
dutsc pushed a commit to dutsc/sglang that referenced this pull request Mar 30, 2026
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
@ZailiWang ZailiWang mentioned this pull request Apr 23, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cpu cpu backend performance optimization deepseek documentation Improvements or additions to documentation intel run-ci sgl-kernel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants