Skip to content

[2/2] Deepseek deterministic: support deepseek v3 deterministic inference on 8 x H200#12095

Merged
Fridge003 merged 18 commits intosgl-project:mainfrom
zminglei:dpsk-full-deterministic
Oct 29, 2025
Merged

[2/2] Deepseek deterministic: support deepseek v3 deterministic inference on 8 x H200#12095
Fridge003 merged 18 commits intosgl-project:mainfrom
zminglei:dpsk-full-deterministic

Conversation

@zminglei
Copy link
Copy Markdown
Collaborator

@zminglei zminglei commented Oct 24, 2025

Motivation

part of roadmap
Previous PR supported deepseek arch model's deterministic inference on a single Hopper GPU.
This PR is to further support full deepseek v3 model's deterministic inference on 8 x H200.

This change also fixed this issue

Modifications

  1. Use default fixed fused MoE kernel config instead of choosing based on batch size.
  2. Use the kernels which are verified to be deterministic instead of dpskv3_deepgemm optimized ones.

Accuracy Tests

Launch deepseek v3 model on 8 x H200

python3 -m sglang.launch_server --model-path /shared/public/elr-models/deepseek-ai/DeepSeek-V3/ --enable-deterministic-inference --tp 8 --trust-remote-code --port 30001

Disable deterministic:

Accuracy: 0.960
Invalid: 0.000
Latency: 24.925 s
Output throughput: 795.660 token/s

python3 -m sglang.test.test_deterministic --test-mode prefix --n-trials 50 --n-start 1 --port 30001
Prompt 0 with prefix length 1: total samples: 346, Unique samples: 93
Prompt 1 with prefix length 511: total samples: 301, Unique samples: 106
Prompt 2 with prefix length 2048: total samples: 315, Unique samples: 78
Prompt 3 with prefix length 4097: total samples: 313, Unique samples: 129

Enable deterministic (without this change):

python3 -m sglang.test.test_deterministic --test-mode prefix --n-trials 50 --n-start 1 --port 30001
Prompt 0 with prefix length 1: total samples: 329, Unique samples: 19
Prompt 1 with prefix length 511: total samples: 310, Unique samples: 20
Prompt 2 with prefix length 2048: total samples: 301, Unique samples: 14
Prompt 3 with prefix length 4097: total samples: 335, Unique samples: 22

Enable deterministic (with this change)

Accuracy: 0.970
Invalid: 0.000
Latency: 51.714 s
Output throughput: 385.778 token/s

python3 -m sglang.test.test_deterministic --test-mode prefix --n-trials 50 --n-start 1 --port 30001
Prompt 0 with prefix length 1: total samples: 327, Unique samples: 1
Prompt 1 with prefix length 511: total samples: 312, Unique samples: 1
Prompt 2 with prefix length 2048: total samples: 303, Unique samples: 1
Prompt 3 with prefix length 4097: total samples: 333, Unique samples: 1

log verifying radix cache is working well

[2025-10-27 21:31:07 TP0] Prefill batch, #new-seq: 1, #new-token: 368, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0,
[2025-10-27 21:31:07 TP0] Prefill batch, #new-seq: 25, #new-token: 8192, #cached-token: 0, token usage: 0.00, #running-req: 1, #queue-req: 22,
[2025-10-27 21:31:08 TP0] Prefill batch, #new-seq: 23, #new-token: 3942, #cached-token: 5362, token usage: 0.01, #running-req: 25, #queue-req: 0,

Benchmarking and Profiling

Checklist

@zminglei zminglei changed the title support deepseek v3 deterministic inference on 8 x H200 [Deterministic] Improve deepseek v3 deterministic inference on 8 x H200 Oct 27, 2025
@zminglei zminglei changed the title [Deterministic] Improve deepseek v3 deterministic inference on 8 x H200 [2/2] Deepseek deterministic: support deepseek v3 deterministic inference on 8 x H200 Oct 28, 2025
Comment thread python/sglang/srt/layers/moe/topk.py Outdated
Comment thread python/sglang/srt/models/deepseek_v2.py Outdated
@Fridge003
Copy link
Copy Markdown
Collaborator

@zminglei Why are you removing the topk changes?

@zminglei
Copy link
Copy Markdown
Collaborator Author

@zminglei Why are you removing the topk changes?

I’ve ran tests and further verified it’s unnecessary for the deterministic result. I only left what’re necessary here in this PR to keep it clean.

@Fridge003
Copy link
Copy Markdown
Collaborator

@Fridge003 Fridge003 merged commit e39628f into sgl-project:main Oct 29, 2025
131 of 143 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants