Skip to content

[logprobs] Enable local deterministic logrprobs testing with strict threshold#10994

Merged
merrymercy merged 24 commits intosgl-project:mainfrom
PrinsYin:lprobs_deterministic
Oct 19, 2025
Merged

[logprobs] Enable local deterministic logrprobs testing with strict threshold#10994
merrymercy merged 24 commits intosgl-project:mainfrom
PrinsYin:lprobs_deterministic

Conversation

@PrinsYin
Copy link
Copy Markdown
Contributor

@PrinsYin PrinsYin commented Sep 27, 2025

Motivation

Numerical logprobs are highly sensitive to GPU type, kernel backend, PyTorch version, and even kernel launch order. This makes CI-based validation unreliable and noisy.

This PR introduces a local, developer-run logprobs test that ensures:

  • bitwise consistency of logprobs under deterministic kernels
  • reproducible diffs before & after code changes

Updated Test Logic

  1. Baseline Generation

    • Fetch 1000 samples from the ShareGPT dataset
    • Randomly choose prompt start positions
    • Run one‑step generation with deterministic kernels
    • Save per‑sample metadata into sglang_baseline_local.pkl
  2. Comparison Phase

    • Load the local baseline and re‑run inference on the same prompts
    • For each sample:
      • Compare per‑token logprobs for overlapping top‑k tokens
      • Assert numerical deltas within 1e‑5 tolerance
      • Verify that return_logprob flags behave correctly
    • The test fails fast with detailed per‑sample diffs if thresholds are exceeded

Key Changes

  • 🔄 Refactored test_logprobs.py into a standalone test runner with two modes:

    • gen: Generate a local baseline from 1000 ShareGPT samples
    • test: Compare current outputs against the local baseline
  • Enabled deterministic execution via enable_deterministic_inference=True

  • Enforced strict logprobs thresholds:

    • max Δ ≤ 1e-5
  • CLI usage:

# Step 1: generate baseline
python test/srt/test_logprobs.py gen

# Step 2: test after code changes
python test/srt/test_logprobs.py test

Next Steps

We recommend all logprobs-related PRs include test results using this local testing suite to ensure stability across future kernel changes.

@zhaochenyang20 zhaochenyang20 marked this pull request as ready for review September 28, 2025 16:52
@zhaochenyang20 zhaochenyang20 changed the title set threshold for deterministic [determined RL] setting extreme threshold for log probs with determined kernel Sep 28, 2025
@Fridge003
Copy link
Copy Markdown
Collaborator

Fridge003 commented Sep 28, 2025

@PrinsYin Will changing to another attention backend (like triton or flashinfer) help with passing CI?
In #10930 we also spotted some internal indeterministic behavior of fa3 kernels.
Or lowering the temperature (maybe to 0) can also help?

@Fridge003
Copy link
Copy Markdown
Collaborator

Also I feel that it's possible for deterministic feature to output different tokens from original output. Since deterministic might change the kernel behavior.

Comment thread test/srt/test_logprobs.py Outdated
Comment thread test/srt/test_logprobs.py
@PrinsYin PrinsYin force-pushed the lprobs_deterministic branch from 35056bf to 38e850e Compare October 13, 2025 18:59
@zhaochenyang20
Copy link
Copy Markdown
Collaborator

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the logprobs test to be a local, developer-run script for generating and comparing against a deterministic baseline, using FlashInfer. The changes are well-structured, introducing gen and test modes and making the tests much stricter to align with deterministic execution. My feedback focuses on improving robustness, maintainability, and clarifying a potential reduction in hardware support for this test.

Comment thread test/srt/test_logprobs.py
Comment thread test/srt/test_logprobs.py Outdated
Comment thread test/srt/test_logprobs.py
Comment thread test/srt/test_logprobs.py
Comment thread test/srt/test_logprobs.py
@PrinsYin
Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@PrinsYin PrinsYin force-pushed the lprobs_deterministic branch from c2993dd to 0db4bc4 Compare October 14, 2025 16:35
@PrinsYin
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors test_logprobs.py into a local, developer-run script with gen and test modes. This is a sensible change, enabling deterministic logprob comparisons against a locally generated baseline, which is more reliable than a CI-based approach due to hardware variations. The changes include stricter tolerance thresholds and the enforcement of deterministic inference settings. My review focuses on enhancing the robustness and maintainability of this new test script. I've provided suggestions for more specific exception handling, ensuring consistent configurations between baseline generation and testing, and adding validation to prevent the creation of an empty baseline file.

Comment thread test/srt/test_logprobs.py Outdated
Comment thread test/srt/test_logprobs.py
Comment thread test/srt/test_logprobs.py
Comment thread test/srt/test_logprobs.py Outdated
PrinsYin and others added 3 commits October 14, 2025 09:38
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@PrinsYin
Copy link
Copy Markdown
Contributor Author

@aftersnow can you help review this pr and use this sciprt to validate the correctness of #6318 ? thanks!

@PrinsYin PrinsYin force-pushed the lprobs_deterministic branch from 1264893 to 1b8b8e8 Compare October 15, 2025 17:12
Comment thread test/srt/test_logprobs.py Outdated
Comment thread test/srt/test_logprobs.py Outdated
Comment thread test/srt/test_logprobs.py Outdated
Comment thread test/srt/test_logprobs.py Outdated
@PrinsYin PrinsYin changed the title [determined RL] setting extreme threshold for log probs with determined kernel [logprobs] Enable local deterministic logrprobs testing with strict threshold Oct 16, 2025
Copy link
Copy Markdown
Collaborator

@zhaochenyang20 zhaochenyang20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to go. Two left minor comments.

Comment thread test/srt/test_logprobs.py Outdated
Comment thread test/srt/test_logprobs.py Outdated
@merrymercy merrymercy merged commit 53fb229 into sgl-project:main Oct 19, 2025
66 of 69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants