Deepseek v4 Support by LiJunscs · Pull Request #1195 · flagos-ai/FlagScale

LiJunscs · 2026-05-07T14:28:29Z

PR Types

[New features]

PR Description

Add DeepSeek V4 model into FlagScale and Megatron-FL
Supported:

CSA and HCA
Hash Router
mHC
Engram(optional)

Unsupported:

Sqrtsoftpuls router score function. ✅
mHC recompute. ✅
Overlap_grad_reduce and overlap_param_gather when Zero 1. ✅
Any infra optimizations.

NOTE: This is only a draft pr, please reivew to give more suggestions.

such as:

File structure.
- All modules are moved to Megatron-FL. Only model_builder is left in Flagscale.
- Delete Engram related CI or not?

Next plan:

Distributed training. ✅
Muon optimizer with Zero 1 adaptation. 😢
Low precision is out of scope of this pr, limited by resource.
Maybe context parallel for sparse attention.
Welcome to give more suggestions.

1. fix incompatibility between engram and mhc; 2. validate training pipeline of deepseek v4. 3. add fake gold value of test deepseek_v4.

LiJunscs · 2026-05-20T12:18:02Z

869b205 /ok to review.

LiJunscs · 2026-05-20T12:19:41Z

Muon optimizer with Zero 1 adaptation will be push after releasing next version FlagOS.

Copilot

Pull request overview

Adds DeepSeek V4 training support into FlagScale/Megatron integration by introducing a dedicated DeepSeek V4 training entrypoint and model wiring (hybrid attention + hyper-connections + optional engram), along with a new functional test case and updated CLI/config plumbing.

Changes:

Introduce DeepSeek V4 model builder/model/block/layer implementations and a new train_deepseek_v4.py entrypoint.
Extend argument parsing/config translation for DeepSeek V4 hybrid attention and related settings.
Add/adjust functional test configs + gold values for a new DeepSeek V4 test case and update CUDA platform test selection.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 17 comments.

Show a summary per file

File	Description
tests/test_utils/config/platforms/cuda.yaml	Adds a DeepSeek test case to CUDA platform functional test selection.
tests/functional_tests/train/deepseek/gold_values/tp1_pp2_ep2_v4.json	New gold values baseline for the DeepSeek V4 functional training case.
tests/functional_tests/train/deepseek/conf/train/tp1_pp2_ep2_v4.yaml	New DeepSeek V4 training configuration used by functional tests.
tests/functional_tests/train/deepseek/conf/train/data.yaml	Updates DeepSeek functional test data/tokenizer paths.
tests/functional_tests/train/deepseek/conf/tp1_pp2_ep2_v4.yaml	New top-level Hydra experiment config pointing to the DeepSeek V4 entrypoint.
flagscale/train/megatron/training/arguments.py	Adds DeepSeek V4 hybrid attention arg parsing + config mapping changes.
flagscale/train/megatron/training/arguments_fs.py	Adds DeepSeek V4-related arg validation and a new optimizer flag; removes local engram arg registration.
flagscale/train/megatron/train_deepseek_v4.py	New DeepSeek V4 training script/entrypoint.
flagscale/models/megatron/engram/short_conv.py	Removes local Engram implementation code (moved upstream).
flagscale/models/megatron/engram/ngram_hash.py	Removes local Engram hashing/tokenizer implementation (moved upstream).
flagscale/models/megatron/engram/multi_head_embedding.py	Removes local Engram embedding implementation (moved upstream).
flagscale/models/megatron/engram/engram.py	Removes local Engram module (moved upstream).
flagscale/models/megatron/engram/engram_transformer_layer.py	Switches to using upstream `megatron.core.transformer.engram.EngramModule`.
flagscale/models/megatron/engram/engram_model.py	Switches hash mapping import to upstream `megatron.core.transformer.engram`.
flagscale/models/megatron/engram/engram_config.py	Removes local EngramConfig dataclass (moved upstream).
flagscale/models/megatron/deepseek_v4/deepseek_transformer_layer.py	New DeepSeek-specific transformer layer wrapper with engram hooks.
flagscale/models/megatron/deepseek_v4/deepseek_transformer_block.py	New DeepSeek-specific transformer block with hyper-connection + (planned) MHC recompute wiring.
flagscale/models/megatron/deepseek_v4/deepseek_model.py	New DeepSeek GPTModel subclass with lazy/async engram hash computation.
flagscale/models/megatron/deepseek_v4/deepseek_builder.py	New DeepSeek model builder/spec wiring for hybrid attention + optional engram.
examples/deepseek_v3/conf/train/next.yaml	Adds an example training config (DeepSeek v3 example directory).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

      train:
        aquila: ["tp2_pp2", "tp4_pp2"]
-        deepseek: ["tp2_pp2_ep2", "tp2_pp2_ep2_engram"]
+        deepseek: ["tp2_pp2_ep2", "tp2_pp2_ep2_engram", "tp1_pp2_ep3_v4"]


+    },
+    "fake": true


 data:
-  data_path: /home/gitlab-runner/data/pile_wikipedia_demo/pile_wikipedia_demo
+  data_path: /workspace/data/enron_emails_demo_text_document_qwen
  split: 1


  tokenizer:
    tokenizer_type: QwenTokenizerFS
-    tokenizer_path: /home/gitlab-runner/tokenizers/qwentokenizer
+    tokenizer_path: /workspace/tokenizers/qwentokenizer


+  shell_cmds: null
+  envs:
+    HYDRA_FULL_ERROR: 1
+    CUDA_VISIBLE_DEVICES: "4,5,6,7"


+           _broadcast(batch['tokens'])
+           _broadcast(batch['attention_mask'])
+           _broadcast(batch['position_ids'])
+            ######### FlagScale Begin ########
+           if mpu.get_dualpipev_pipeline_model_parallel_world_size() is not None:


+if TYPE_CHECKING:
+    from megatron.core.tensor_parallel.random import CheckpointManager
+else:
+    CheckpointManager = None
+


+                            next_layer = self.layers[l_no + 1]
+                            if getattr(next_layer, "is_engram_layer", False):
+                                next_layer.pre_compute_embedding(engram_hash_input_ids)
+                        #### FlagScale End ####
+                        hidden_states, context = layer(


+        # Precompute the engram_hash_iput_ids, it will be used to create a TransformerChunkSchedulePlan.
+        engram_hash_input_ids = LazyHashInputIds(
+            hash_mapping=self.engram_hash,
+            input_ids=input_ids,
+            hash_stream=self._hash_stream,
+        )
+        if extra_block_kwargs is None:
+            extra_block_kwargs = {
+                "engram_hash_input_ids": engram_hash_input_ids,
+            }


+    """Build decoder block spec and attach STM/HC placeholders to each local layer."""
+
+    """GPT block spec."""
+    layer_norm_impl = TENorm


…gestions.

…ed by the num_residual_streams.

LiJunscs changed the title ~~Deepseek v4~~ Deepseek v4 Support May 7, 2026

LiJunscs self-assigned this May 7, 2026

LiJunscs marked this pull request as ready for review May 12, 2026 11:21

LiJunscs requested review from aoyulong, heavyrain-lzy and zhaoyinglia as code owners May 12, 2026 11:21

LiJunscs added 4 commits May 20, 2026 20:06

init commit

98d6b4a

[train]: complete deepseek_v4 training pipeline(non-distributed)

a35c8fa

[Train]: DeepSeek-V4 distributed traning

261c60b

[train]:

869b205

1. fix incompatibility between engram and mhc; 2. validate training pipeline of deepseek v4. 3. add fake gold value of test deepseek_v4.

LiJunscs force-pushed the deepseek_v4 branch from b8da0d6 to 869b205 Compare May 20, 2026 12:08

LiJunscs requested review from Copilot and lxd-cumt May 20, 2026 12:19

Copilot started reviewing on behalf of LiJunscs May 20, 2026 12:20 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

[trai]: chore, update some ci related settings based on copilot's sug…

085a400

…gestions.

aoyulong previously approved these changes May 24, 2026

View reviewed changes

zhaoyinglia and others added 2 commits May 24, 2026 21:07

Merge branch 'main' into deepseek_v4

653b768

[train]: format and remove useless deepseek_v4 example yaml.

4149e0d

LiJunscs dismissed aoyulong’s stale review via 4149e0d May 24, 2026 14:05

LiJunscs added 2 commits May 24, 2026 23:21

[train], fix, remove wrong deepseek_v4 test gold value.

3b8349f

[train]: remove error config name of engram. engram_hc_mult is replac…

e983c11

…ed by the num_residual_streams.

zhaoyinglia approved these changes May 24, 2026

View reviewed changes

zhaoyinglia merged commit 5c04b69 into flagos-ai:main May 24, 2026
5 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepseek v4 Support#1195

Deepseek v4 Support#1195
zhaoyinglia merged 9 commits into
flagos-ai:mainfrom
LiJunscs:deepseek_v4

LiJunscs commented May 7, 2026 •

edited

Loading

Uh oh!

LiJunscs commented May 20, 2026

Uh oh!

LiJunscs commented May 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

LiJunscs commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

PR Description

NOTE: This is only a draft pr, please reivew to give more suggestions.

Next plan:

Uh oh!

LiJunscs commented May 20, 2026

Uh oh!

LiJunscs commented May 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

LiJunscs commented May 7, 2026 •

edited

Loading