Skip to content

perf: pad max-num-requests in decode cuda graph for higher coverage#20978

Merged
hnyls2002 merged 3 commits intosgl-project:mainfrom
happierpig:ylzhao/patch-decode-cuda-graph
Mar 23, 2026
Merged

perf: pad max-num-requests in decode cuda graph for higher coverage#20978
hnyls2002 merged 3 commits intosgl-project:mainfrom
happierpig:ylzhao/patch-decode-cuda-graph

Conversation

@happierpig
Copy link
Copy Markdown
Contributor

Motivation

For long-context scenarios with small concurrecy, themul-base filtering could lead to the max-num-requests is not captured. leading to substantial perf drop.

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Copy Markdown
Collaborator

@hnyls2002 hnyls2002 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hnyls2002
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@hnyls2002 hnyls2002 merged commit 3439988 into sgl-project:main Mar 23, 2026
151 of 168 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants