[POC] Support deterministic inference#10417
[POC] Support deterministic inference#10417Fridge003 wants to merge 16 commits intosgl-project:mainfrom
Conversation
| encoder_lens=encoder_lens, | ||
| spec_info=spec_info, | ||
| fixed_split_size=-1, | ||
| disable_split_kv=True, |
There was a problem hiding this comment.
should we also consider the num of heads?
Related comment: Dao-AILab/flash-attention#609 (comment)
There was a problem hiding this comment.
Yes this is more like an optional heuristic
|
FA3 Backend are batch invariant as well after this PR Single Mode Total samples: 50, Unique samples: 1 Mixed Mode Prefix Mode And after I disabled enable_batch_invariant_mode, single mode will fail Single Mode Total samples: 50, Unique samples: 5 |
|
Triton Backend need some work |
co-author: @hebiao064 , @Qiaolin-Yu
Motivation
#10278
Modifications
Reproduction
Environment: H200, Cuda12.6, sglang 0.5.2, torch 2.8.0, Python 3.12.11
Launch qwen3-8b:
Test determinism with single prompt and different batch sizes:
Test determinism with mixture of short prompts and long prompts in each batch:
# Requires running multiple times python3 -m sglang.test.test_deterministic --test-mode mixed Prompt 1: total samples: 644, Unique samples: 1 Prompt 2: total samples: 423, Unique samples: 1 Long prompt: total samples: 208, Unique samples: 1Test determinism with multiple prompts with different lengths of common prefix:
Accuracy Tests
GSM8K
GPQA
Temperature > 0
For launching, we need to specify pytorch sampling backend temporarily
For testing, we set temperature to a value greater than 0
Benchmarking and Profiling
Checklist