Enable piecewise-cuda-graph when logprob_start_len = -1 by Qiaolin-Yu · Pull Request #19453 · sgl-project/sglang

Qiaolin-Yu · 2026-02-26T23:02:35Z

Motivation

Modifications

Currently logprob_start_len= len(input_ids) - 1 is useless, since the loprob of the first decode token will be included in output_log_probs and not controlled by this attribute. If logprob_start_len= len(input_ids) - 1, it only adds a useless computation and blocks pcg. As a workaround, we adjust the default value.

Tests

python3 -m sglang.launch_server --model-path meta-llama/Llama-3.2-1B-Instruct --enable-piecewise-cuda-graph

# -1 (last token)
curl -X POST http://localhost:30000/generate \  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, world!",
    "sampling_params": {
      "max_new_tokens": 16,
      "temperature": 0
    },
    "return_logprob": true,
    "logprob_start_len": -1,
    "top_logprobs_num": 3
  }'

See the prefill log.

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-02-26T23:02:38Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Qiaolin-Yu · 2026-02-26T23:04:36Z

/tag-and-rerun-ci

Qiaolin-Yu · 2026-03-09T05:03:22Z

            other_args=[
                "--attention-backend",
-                "flashinfer",
+                "triton",


flashinfer uses non-ragged w/ pcg, and ragged w/o pcg. therefore, it will be not bit-wise and make the test fail.

…19453)

upd

49dba6f

Qiaolin-Yu requested review from ByronHsu, ShangmingCai, Ying1123, hebiao064, hnyls2002, merrymercy and xiezhq-hermann as code owners February 26, 2026 23:02

Qiaolin-Yu assigned ispobock and ch-wan Feb 26, 2026

Qiaolin-Yu requested review from ch-wan and ispobock February 26, 2026 23:03

Qiaolin-Yu assigned hnyls2002 Feb 26, 2026

Merge branch 'main' into fix_pcg_logprob

b6cea3a

github-actions Bot added the run-ci label Feb 26, 2026

Qiaolin-Yu added 4 commits February 26, 2026 19:49

Merge branch 'main' into fix_pcg_logprob

4a71c78

upd

43dfd04

upd

5dc477c

revert

46d65f7

ch-wan approved these changes Mar 2, 2026

View reviewed changes

Qiaolin-Yu added 3 commits March 6, 2026 20:02

Merge branch 'main' into fix_pcg_logprob

565570b

Merge remote-tracking branch 'upstream/main' into fix_pcg_logprob

957883c

upd

5941ceb

Qiaolin-Yu commented Mar 9, 2026

View reviewed changes

Merge branch 'main' into fix_pcg_logprob

ecfcfb1

Qiaolin-Yu added the high priority label Mar 9, 2026

Fridge003 approved these changes Mar 10, 2026

View reviewed changes

Fridge003 merged commit a3d88a2 into sgl-project:main Mar 10, 2026
160 of 169 checks passed

liubiyongge pushed a commit to liubiyongge/sglang that referenced this pull request Mar 13, 2026

Enable piecewise-cuda-graph when logprob_start_len = -1 (sgl-project#…

1565a95

…19453)

Qiaolin-Yu mentioned this pull request Mar 20, 2026

[Roadmap] logprob refactor and improvement #21048

Open

9 tasks

Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026

Enable piecewise-cuda-graph when logprob_start_len = -1 (sgl-project#…

584115a

…19453)

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026

Enable piecewise-cuda-graph when logprob_start_len = -1 (sgl-project#…

3e61a6b

…19453)

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

Enable piecewise-cuda-graph when logprob_start_len = -1 (sgl-project#…

a2f2ea4

…19453)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable piecewise-cuda-graph when logprob_start_len = -1#19453

Enable piecewise-cuda-graph when logprob_start_len = -1#19453
Fridge003 merged 10 commits intosgl-project:mainfrom
Qiaolin-Yu:fix_pcg_logprob

Qiaolin-Yu commented Feb 26, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Feb 26, 2026

Uh oh!

Qiaolin-Yu commented Feb 26, 2026

Uh oh!

Qiaolin-Yu Mar 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Qiaolin-Yu commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist Bot commented Feb 26, 2026

Uh oh!

Qiaolin-Yu commented Feb 26, 2026

Uh oh!

Qiaolin-Yu Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Qiaolin-Yu commented Feb 26, 2026 •

edited

Loading

Qiaolin-Yu Mar 9, 2026 •

edited

Loading