DeepSeek V4 Upstream Dependency Tracker
This issue tracks upstream dependency items that can block or change the
DeepSeek V4 RL support path in NeMo RL.
Current Goal
Enable end-to-end RL training for DeepSeek V4 Flash on the Slurm cluster,
including:
- policy-side model loading/training
- vLLM generation worker support
- compatible container/venv stack
- short GRPO validation
- logprob correctness validation
Active Upstream Items
| Dependency |
Upstream item |
Status on 2026-04-24 |
Impact |
Local action |
| Automodel |
NVIDIA-NeMo/Automodel#2034 |
Open |
Blocks native policy-side support for DeepseekV4ForCausalLM. |
Track registry support, state dict adapter, and first usable commit. |
| vLLM |
vllm-project/vllm#40760 |
Open PR, not merged |
Adds core DeepSeek V4 serving implementation and custom kernels. |
Track merge status and any changes to required serving args, kernels, or checkpoint assumptions. |
| vLLM |
vllm-project/vllm#40778 |
Open |
General DSV4 support discussion. Includes hardware failures and Base deployment reports. |
Watch for maintainer guidance on supported GPU architectures and recommended commands. |
| vLLM |
vllm-project/vllm#40790 |
Open |
Confirms DeepSeek-V4-Flash-Base fails during vLLM MoE weight loading with _load_w13 tensor shape mismatch. |
Use DeepSeek-V4-Flash for current vLLM smoke tests; treat Base as blocked until vLLM supports the FP8 expert layout or a converter exists. |
Watch List
| Dependency |
What to watch |
Why it matters |
| Transformers |
Native deepseek_v4 config/model/tokenizer support |
Policy-side loading may depend on exact architecture classes and config parsing. |
| DeepGEMM |
SM architecture support for DSV4 kernels |
vLLM discussions report unsupported architecture errors outside some supported GPU paths. |
| DeepEP / expert parallel |
DSV4 EP runtime behavior |
Generation worker stability and throughput may depend on EP/all-to-all behavior. |
| FlashInfer / attention kernels |
Sparse or DSV4-specific MLA/indexer kernels |
DSV4 serving uses specialized attention/indexer behavior. |
Current Local Decisions
- Use
DeepSeek-V4-Flash for vLLM generation validation.
- Do not use
DeepSeek-V4-Flash-Base for vLLM generation until the Base FP8 checkpoint layout is supported or converted.
- Treat Automodel support as the primary training-side blocker.
- Treat vLLM hardware errors separately from Base checkpoint format errors.
Known Local Context
Current vLLM Base failure observed locally:
Update Protocol
Update this issue when:
- vLLM PR #40760 lands or changes DSV4 runtime assumptions.
- vLLM issues #40778 or #40790 receive actionable maintainer guidance.
- We hit a new upstream dependency blocker in Transformers, DeepGEMM, DeepEP, FlashInfer, or related packages.
- A validated container/image or model path changes the support path.
DeepSeek V4 Upstream Dependency Tracker
This issue tracks upstream dependency items that can block or change the
DeepSeek V4 RL support path in NeMo RL.
Current Goal
Enable end-to-end RL training for DeepSeek V4 Flash on the Slurm cluster,
including:
Active Upstream Items
DeepseekV4ForCausalLM.DeepSeek-V4-Flash-Basefails during vLLM MoE weight loading with_load_w13tensor shape mismatch.DeepSeek-V4-Flashfor current vLLM smoke tests; treat Base as blocked until vLLM supports the FP8 expert layout or a converter exists.Watch List
deepseek_v4config/model/tokenizer supportCurrent Local Decisions
DeepSeek-V4-Flashfor vLLM generation validation.DeepSeek-V4-Flash-Basefor vLLM generation until the Base FP8 checkpoint layout is supported or converted.Known Local Context
Current vLLM Base failure observed locally:
DeepSeek-V4-Flash-Basefails during vLLM MoE weight loading._load_w13tensor shape mismatch.Update Protocol
Update this issue when: