[CI] Improve PP consistency check success rate by ShangmingCai · Pull Request #20838 · sgl-project/sglang

ShangmingCai · 2026-03-18T08:05:58Z

Motivation

The current pp consistency check is flaky, since the benchmark accuracy of the model is not stable. If the diff is greater than 2%, we need to rerun the whole suite again, which wastes our CI resources.

num_questions changed from 200 to 512, more reqs will help us get a more stable accuracy, so the PP accuracy diff will be smaller, which will improve the success rate
Fix error msg, "1%" -> "2%"

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

Signed-off-by: Shangming Cai <csmthu@gmail.com>

gemini-code-assist · 2026-03-18T08:06:12Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on improving the stability and success rate of the Parallel Processing (PP) consistency checks within the CI pipeline. By increasing the dataset size for accuracy tests, the changes aim to yield more consistent results. Concurrently, the tolerance for acceptable accuracy drops in the consistency check has been adjusted, which should reduce the likelihood of intermittent CI failures and enhance the overall robustness of the testing process.

Highlights

Test Stability Improvement: The number of questions used in the GSM8K accuracy tests has been increased from 200 to 512. This change aims to provide a more stable and reliable accuracy measurement, thereby improving the success rate of PP consistency checks.
Consistency Check Tolerance Adjustment: The error message for PP accuracy drops has been updated, changing the reported threshold from '1%' to '2%'. This adjustment reflects a revised tolerance for accuracy variations, contributing to fewer false positives in CI checks.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

ShangmingCai · 2026-03-18T08:07:19Z

/rerun-stage stage-c-test-4-gpu-h100

github-actions · 2026-03-18T08:07:44Z

✅ Triggered stage-c-test-4-gpu-h100 to run independently (skipping dependencies).

github-actions · 2026-03-18T08:07:50Z

🔗 View workflow run

gemini-code-assist

Code Review

This pull request aims to improve the consistency of pipeline parallelism (PP) tests by increasing the number of questions used in the gsm8k benchmark and fixing an incorrect percentage in an assertion message. The changes are correct and achieve the stated goal. My review includes suggestions to replace the newly introduced magic numbers with constants to improve code maintainability and prevent potential inconsistencies in the future. Specifically, I've pointed out the repeated use of num_questions=512 and the hardcoded accuracy threshold and its corresponding percentage in error messages.

ShangmingCai · 2026-03-18T08:09:39Z

/rerun-ut test/registered/distributed/test_pp_single_node.py

github-actions · 2026-03-18T08:10:04Z

✅ Triggered /rerun-ut on 4-gpu-h100 runner:

cd test/ && python3 registered/distributed/test_pp_single_node.py

github-actions · 2026-03-18T08:10:09Z

🔗 View workflow run

ShangmingCai · 2026-03-19T08:08:08Z

/rerun-ut test/registered/distributed/test_pp_single_node.py

github-actions · 2026-03-19T08:08:28Z

✅ Triggered /rerun-ut on 4-gpu-h100 runner:

cd test/ && python3 registered/distributed/test_pp_single_node.py

github-actions · 2026-03-19T08:08:34Z

🔗 View workflow run

ShangmingCai · 2026-03-19T10:40:58Z

https://github.com/sgl-project/sglang/actions/runs/23285613694/job/67708248542

Accuracy diff seems a lot better now.

ShangmingCai · 2026-03-19T10:54:50Z

Compared to the previous successful run:
(https://github.com/sgl-project/sglang/actions/runs/23176040059/job/67719743331?pr=19669)

The estimated elapsed time increased from 580s -> 640s

10% more time each turn, but in return, the success rate should be 100% now.

ShangmingCai · 2026-03-19T11:12:40Z

Full H100 CI pass.

Signed-off-by: Shangming Cai <csmthu@gmail.com>

[CI] Improve PP consistency check success rate

373d951

Signed-off-by: Shangming Cai <csmthu@gmail.com>

gemini-code-assist Bot reviewed Mar 18, 2026

View reviewed changes

Comment thread test/registered/distributed/test_pp_single_node.py

Comment thread test/registered/distributed/test_pp_single_node.py

upd time

ad1e7b9

ShangmingCai merged commit 4c52b7f into main Mar 19, 2026
63 of 69 checks passed

ShangmingCai deleted the clean_pp_test branch March 19, 2026 11:12

alisonshao mentioned this pull request Mar 21, 2026

[Qwen3.5] Fix broken pipeline parallelism layer splitting #21070

Merged

2 tasks

Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026

[CI] Improve PP consistency check success rate (sgl-project#20838)

697f0bf

Signed-off-by: Shangming Cai <csmthu@gmail.com>

0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026

[CI] Improve PP consistency check success rate (sgl-project#20838)

f6a7394

Signed-off-by: Shangming Cai <csmthu@gmail.com>

dutsc pushed a commit to dutsc/sglang that referenced this pull request Mar 30, 2026

[CI] Improve PP consistency check success rate (sgl-project#20838)

05580c4

Signed-off-by: Shangming Cai <csmthu@gmail.com>

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026

[CI] Improve PP consistency check success rate (sgl-project#20838)

5ec2e7f

Signed-off-by: Shangming Cai <csmthu@gmail.com>

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

[CI] Improve PP consistency check success rate (sgl-project#20838)

dfea1d4

Signed-off-by: Shangming Cai <csmthu@gmail.com>

Conversation

ShangmingCai commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist Bot commented Mar 18, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

ShangmingCai commented Mar 18, 2026

Uh oh!

github-actions Bot commented Mar 18, 2026

Uh oh!

github-actions Bot commented Mar 18, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

ShangmingCai commented Mar 18, 2026

Uh oh!

github-actions Bot commented Mar 18, 2026

Uh oh!

github-actions Bot commented Mar 18, 2026

Uh oh!

ShangmingCai commented Mar 19, 2026

Uh oh!

github-actions Bot commented Mar 19, 2026

Uh oh!

github-actions Bot commented Mar 19, 2026

Uh oh!

ShangmingCai commented Mar 19, 2026

Uh oh!

ShangmingCai commented Mar 19, 2026

Uh oh!

ShangmingCai commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ShangmingCai commented Mar 18, 2026 •

edited

Loading