perf: pad max-num-requests in decode cuda graph for higher coverage by happierpig · Pull Request #20978 · sgl-project/sglang

happierpig · 2026-03-20T05:12:21Z

Motivation

For long-context scenarios with small concurrecy, themul-base filtering could lead to the max-num-requests is not captured. leading to substantial perf drop.

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-03-20T05:12:24Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

hnyls2002

LGTM

hnyls2002 · 2026-03-20T07:37:39Z

/tag-and-rerun-ci

…gl-project#20978)

upd

42da81a

happierpig requested review from Fridge003, Ying1123, hnyls2002, ispobock and merrymercy as code owners March 20, 2026 05:12

hnyls2002 approved these changes Mar 20, 2026

View reviewed changes

github-actions Bot added the run-ci label Mar 20, 2026

hnyls2002 added 2 commits March 20, 2026 18:17

Merge branch 'main' into ylzhao/patch-decode-cuda-graph

b127015

Merge branch 'main' into ylzhao/patch-decode-cuda-graph

dcd4276

hnyls2002 merged commit 3439988 into sgl-project:main Mar 23, 2026
151 of 168 checks passed

0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026

perf: pad max-num-requests in decode cuda graph for higher coverage (s…

1d304f9

…gl-project#20978)

dutsc pushed a commit to dutsc/sglang that referenced this pull request Mar 30, 2026

perf: pad max-num-requests in decode cuda graph for higher coverage (s…

03fe42c

…gl-project#20978)

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026

perf: pad max-num-requests in decode cuda graph for higher coverage (s…

3b8f496

…gl-project#20978)

billishyahao mentioned this pull request Apr 11, 2026

[AMD] fix tbo runtime error when initializing metadata for cuda graph #22598

Merged

5 tasks

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

perf: pad max-num-requests in decode cuda graph for higher coverage (s…

8de089a

…gl-project#20978)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: pad max-num-requests in decode cuda graph for higher coverage#20978

perf: pad max-num-requests in decode cuda graph for higher coverage#20978
hnyls2002 merged 3 commits intosgl-project:mainfrom
happierpig:ylzhao/patch-decode-cuda-graph

happierpig commented Mar 20, 2026

Uh oh!

gemini-code-assist Bot commented Mar 20, 2026

Uh oh!

hnyls2002 left a comment

Uh oh!

hnyls2002 commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

happierpig commented Mar 20, 2026

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist Bot commented Mar 20, 2026

Uh oh!

hnyls2002 left a comment

Choose a reason for hiding this comment

Uh oh!

hnyls2002 commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants