Add support for more batch sizes in cpu_graph_runner#13881
Add support for more batch sizes in cpu_graph_runner#13881Kangyan-Zhou merged 38 commits intosgl-project:mainfrom
Conversation
c1692a4 to
64ad61e
Compare
|
Hi @Alcanderian @FlamingoPg @zhyncs Currently, sglang uses parameters with a "cuda" prefix for graph capture, such as |
16ab343 to
36e8557
Compare
|
/tag-run-ci-label |
|
@Alcanderian @zhyncs Could you please help review this PR ? Thanks. |
|
@FlamingoPg @Alcanderian Could you please review this PR ? Thank you. |
|
/rerun-failed-ci |
1 similar comment
|
/rerun-failed-ci |
|
/rerun-failed-ci |
|
/rerun-failed-ci |
|
/rerun-failed-ci |
Motivation
Add support for more batch sizes in cpu_graph_runner to reduce python overhead and achieve higher performance.
Modifications
replay_prepareto pad the input in order to utilize the compiled graph.capture_bs, adding more support forcapture_bs.--cuda-graph-bsto allow users to customize the graph batch size on CPU.Checklist