[logprobs] Enable local deterministic logrprobs testing with strict threshold#10994
[logprobs] Enable local deterministic logrprobs testing with strict threshold#10994merrymercy merged 24 commits intosgl-project:mainfrom
Conversation
|
Also I feel that it's possible for deterministic feature to output different tokens from original output. Since deterministic might change the kernel behavior. |
945acf9 to
846b3fe
Compare
35056bf to
38e850e
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors the logprobs test to be a local, developer-run script for generating and comparing against a deterministic baseline, using FlashInfer. The changes are well-structured, introducing gen and test modes and making the tests much stricter to align with deterministic execution. My feedback focuses on improving robustness, maintainability, and clarifying a potential reduction in hardware support for this test.
|
/gemini review |
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
c2993dd to
0db4bc4
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors test_logprobs.py into a local, developer-run script with gen and test modes. This is a sensible change, enabling deterministic logprob comparisons against a locally generated baseline, which is more reliable than a CI-based approach due to hardware variations. The changes include stricter tolerance thresholds and the enforcement of deterministic inference settings. My review focuses on enhancing the robustness and maintainability of this new test script. I've provided suggestions for more specific exception handling, ensuring consistent configurations between baseline generation and testing, and adding validation to prevent the creation of an empty baseline file.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
@aftersnow can you help review this pr and use this sciprt to validate the correctness of #6318 ? thanks! |
1264893 to
1b8b8e8
Compare
zhaochenyang20
left a comment
There was a problem hiding this comment.
Great to go. Two left minor comments.
Motivation
Numerical logprobs are highly sensitive to GPU type, kernel backend, PyTorch version, and even kernel launch order. This makes CI-based validation unreliable and noisy.
This PR introduces a local, developer-run logprobs test that ensures:
Updated Test Logic
Baseline Generation
sglang_baseline_local.pklComparison Phase
1e‑5tolerancereturn_logprobflags behave correctlyKey Changes
🔄 Refactored
test_logprobs.pyinto a standalone test runner with two modes:gen: Generate a local baseline from 1000 ShareGPT samplestest: Compare current outputs against the local baselineEnabled deterministic execution via
enable_deterministic_inference=TrueEnforced strict logprobs thresholds:
max Δ ≤ 1e-5CLI usage:
Next Steps
We recommend all logprobs-related PRs include test results using this local testing suite to ensure stability across future kernel changes.