Support DeepSeek v3.2 by zianglih · Pull Request #963 · radixark/miles

zianglih · 2026-04-09T20:15:36Z

Add V3.2 model configs and launch script
Reuse existing GLM-5 code path
Add support for V3.2 non-interleaved indexer format
- Patch indexer_rope_interleave into args
Official V3.2 does not use chat template, add a fallback
- Patch model type into tokenizer
Add a --freeze-indexer arg

gemini-code-assist

Code Review

This pull request introduces support for DeepSeek-V3.2 models, including necessary configuration patching, tokenizer handling, and adjustments to RoPE interleave logic in the Megatron-to-HF conversion and model plugins. Review feedback highlights the need to replace hardcoded model dimensions (e.g., 128, 64) with dynamic configuration values and to use field(default_factory=...) for dataclass default values to ensure proper execution.

zianglih · 2026-04-14T18:36:58Z

-        "You are using too many GPUs for this conversion."
+    assert args.pipeline_model_parallel_size <= args.num_layers, (
+        f"Pipeline model parallel size {args.pipeline_model_parallel_size} must be less than or equal to "
+        f"number of layers {args.num_layers}."


This is to relax the assertion to unblock conversion.

Zhichenzzz · 2026-04-30T02:02:06Z

Thanks @zianglih! Could you share which docker image and sgLang commit you used for testing?

zianglih marked this pull request as ready for review April 9, 2026 20:17

zianglih requested review from fzyzcjy, guapisolo, maocheng23, yueming-yuan and yushengsu-thu as code owners April 9, 2026 20:17

gemini-code-assist Bot reviewed Apr 9, 2026

View reviewed changes

ziang-and pushed a commit to zianglih/miles that referenced this pull request Apr 9, 2026

Squash changes from radixark#963

ffc9078

ziang-and pushed a commit to zianglih/miles that referenced this pull request Apr 10, 2026

Squash changes from radixark#963

1774ab7

zianglih mentioned this pull request Apr 10, 2026

[Roadmap] Blackwell MXFP8 and NVFP4 RL training #615

Open

30 tasks

ziang-and pushed a commit to zianglih/miles that referenced this pull request Apr 11, 2026

Squash changes from radixark#963

097c1c9

zianglih commented Apr 14, 2026

View reviewed changes

ziang-and pushed a commit to zianglih/miles that referenced this pull request Apr 19, 2026

Squash changes from radixark#963

0109d2f

zianglih added 17 commits April 21, 2026 13:15

Test v32

091d726

Fix script

90f6ae1

Fix tokenizer

85e4046

Fix script

fcc86ae

Fix rope indexer

956be91

Add full model

d2ec8b2

Clean up

ece0e0c

Fix env var

f8e0391

Simplify tokenizer fallback

0c47e78

Replace huggingface-cli with hf

1bed156

Clean up

053e49a

Update script

9396adf

RoPE fix for v0.5.10 bump

aa9c89f

Set TP2 CP1

16fc923

Add --freeze-indexer

a37917c

Minor fix --freeze-indexer arg

65af68f

Add --fp8-indexer

3a76914

ziang-and force-pushed the test-v32 branch from f6a9bef to 3a76914 Compare April 21, 2026 20:17

Zhichenzzz added the run-ci-megatron label Apr 27, 2026

Refactor shim patching

9125808

ziang-and requested a review from Zhichenzzz as a code owner April 29, 2026 19:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support DeepSeek v3.2#963

Support DeepSeek v3.2#963
zianglih wants to merge 18 commits intoradixark:mainfrom
zianglih:test-v32

zianglih commented Apr 9, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zianglih Apr 14, 2026

Uh oh!

Zhichenzzz commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zianglih commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zianglih Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Zhichenzzz commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zianglih commented Apr 9, 2026 •

edited

Loading