[2/2] Deepseek deterministic: support deepseek v3 deterministic inference on 8 x H200 by zminglei · Pull Request #12095 · sgl-project/sglang

zminglei · 2025-10-24T23:13:53Z

Motivation

part of roadmap
Previous PR supported deepseek arch model's deterministic inference on a single Hopper GPU.
This PR is to further support full deepseek v3 model's deterministic inference on 8 x H200.

This change also fixed this issue

Modifications

Use default fixed fused MoE kernel config instead of choosing based on batch size.
Use the kernels which are verified to be deterministic instead of dpskv3_deepgemm optimized ones.

Accuracy Tests

Launch deepseek v3 model on 8 x H200

python3 -m sglang.launch_server --model-path /shared/public/elr-models/deepseek-ai/DeepSeek-V3/ --enable-deterministic-inference --tp 8 --trust-remote-code --port 30001

Disable deterministic:

Accuracy: 0.960
Invalid: 0.000
Latency: 24.925 s
Output throughput: 795.660 token/s

python3 -m sglang.test.test_deterministic --test-mode prefix --n-trials 50 --n-start 1 --port 30001
Prompt 0 with prefix length 1: total samples: 346, Unique samples: 93
Prompt 1 with prefix length 511: total samples: 301, Unique samples: 106
Prompt 2 with prefix length 2048: total samples: 315, Unique samples: 78
Prompt 3 with prefix length 4097: total samples: 313, Unique samples: 129

Enable deterministic (without this change):

python3 -m sglang.test.test_deterministic --test-mode prefix --n-trials 50 --n-start 1 --port 30001
Prompt 0 with prefix length 1: total samples: 329, Unique samples: 19
Prompt 1 with prefix length 511: total samples: 310, Unique samples: 20
Prompt 2 with prefix length 2048: total samples: 301, Unique samples: 14
Prompt 3 with prefix length 4097: total samples: 335, Unique samples: 22

Enable deterministic (with this change)

Accuracy: 0.970
Invalid: 0.000
Latency: 51.714 s
Output throughput: 385.778 token/s

python3 -m sglang.test.test_deterministic --test-mode prefix --n-trials 50 --n-start 1 --port 30001
Prompt 0 with prefix length 1: total samples: 327, Unique samples: 1
Prompt 1 with prefix length 511: total samples: 312, Unique samples: 1
Prompt 2 with prefix length 2048: total samples: 303, Unique samples: 1
Prompt 3 with prefix length 4097: total samples: 333, Unique samples: 1

log verifying radix cache is working well

[2025-10-27 21:31:07 TP0] Prefill batch, #new-seq: 1, #new-token: 368, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0,
[2025-10-27 21:31:07 TP0] Prefill batch, #new-seq: 25, #new-token: 8192, #cached-token: 0, token usage: 0.00, #running-req: 1, #queue-req: 22,
[2025-10-27 21:31:08 TP0] Prefill batch, #new-seq: 23, #new-token: 3942, #cached-token: 5362, token usage: 0.01, #running-req: 25, #queue-req: 0,

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

…/sglang into dpsk-full-deterministic

…stic

Fridge003 · 2025-10-29T02:08:49Z

@zminglei Why are you removing the topk changes?

zminglei · 2025-10-29T02:15:11Z

@zminglei Why are you removing the topk changes?

I’ve ran tests and further verified it’s unnecessary for the deterministic result. I only left what’re necessary here in this PR to keep it clean.

Fridge003 · 2025-10-29T18:48:37Z

https://github.com/sgl-project/sglang/actions/runs/18901971075/job/54006766322?pr=12095

zminglei added 3 commits October 24, 2025 23:09

support deepseek v3 deterministic inference on 8 x H200

d24596a

fix

d5ff831

lint

9b15ac6

zminglei changed the title ~~support deepseek v3 deterministic inference on 8 x H200~~ [Deterministic] Improve deepseek v3 deterministic inference on 8 x H200 Oct 27, 2025

zminglei added 5 commits October 27, 2025 16:59

Merge branch 'main' into dpsk-full-deterministic

8a0b43e

Merge branch 'main' into dpsk-full-deterministic

f800e58

fix fused_moe

472aba5

Merge branch 'dpsk-full-deterministic' of https://github.com/zminglei…

b534ac5

…/sglang into dpsk-full-deterministic

resolve conflict

7488674

zminglei changed the title ~~[Deterministic] Improve deepseek v3 deterministic inference on 8 x H200~~ [2/2] Deepseek deterministic: support deepseek v3 deterministic inference on 8 x H200 Oct 28, 2025

fix lint

6254054

zminglei marked this pull request as ready for review October 28, 2025 04:43

zminglei requested review from BBuf, Edwardf0t1, HaiShaw, Ying1123, ch-wan, ispobock, kushanam, merrymercy and zhyncs as code owners October 28, 2025 04:43

zminglei mentioned this pull request Oct 28, 2025

[Bug] Qwen3-235B-A22B-Thinking is not deterministic with --enable-deterministic-inference #12232

Closed

5 tasks

update

a41b523

hebiao064 reviewed Oct 28, 2025

View reviewed changes

Comment thread python/sglang/srt/layers/moe/topk.py Outdated

hebiao064 reviewed Oct 28, 2025

View reviewed changes

Comment thread python/sglang/srt/models/deepseek_v2.py Outdated

zminglei added 2 commits October 27, 2025 23:21

address comments

432d194

lint

34cc012

hebiao064 approved these changes Oct 28, 2025

View reviewed changes

Fridge003 added the run-ci label Oct 28, 2025

Fridge003 approved these changes Oct 28, 2025

View reviewed changes

zminglei added 3 commits October 28, 2025 16:08

Merge remote-tracking branch 'upstream/main' into dpsk-full-determini…

f3dfa11

…stic

fix a CI test

8b5a710

remove topk change

3a9cb2c

Fridge003 added 3 commits October 28, 2025 20:42

Merge branch 'main' into dpsk-full-deterministic

99341b6

Merge branch 'main' into dpsk-full-deterministic

79c7d9d

Merge branch 'main' into dpsk-full-deterministic

6ccbf03

Fridge003 merged commit e39628f into sgl-project:main Oct 29, 2025
131 of 143 checks passed

Fridge003 mentioned this pull request Oct 30, 2025

[Feature] Support deterministic inference with Batch Invariant Ops #10278

Closed

28 tasks

Fridge003 mentioned this pull request Jan 20, 2026

Is the fused_moe_triton kernel batch-invariant? #17424

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[2/2] Deepseek deterministic: support deepseek v3 deterministic inference on 8 x H200#12095

[2/2] Deepseek deterministic: support deepseek v3 deterministic inference on 8 x H200#12095
Fridge003 merged 18 commits intosgl-project:mainfrom
zminglei:dpsk-full-deterministic

zminglei commented Oct 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Fridge003 commented Oct 29, 2025

Uh oh!

zminglei commented Oct 29, 2025

Uh oh!

Fridge003 commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zminglei commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

Uh oh!

Uh oh!

Fridge003 commented Oct 29, 2025

Uh oh!

zminglei commented Oct 29, 2025

Uh oh!

Fridge003 commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zminglei commented Oct 24, 2025 •

edited

Loading