Track DeepSeek V4 upstream dependency blockers

# DeepSeek V4 Upstream Dependency Tracker

This issue tracks upstream dependency items that can block or change the
DeepSeek V4 RL support path in NeMo RL.

## Current Goal

Enable end-to-end RL training for DeepSeek V4 Flash on the Slurm cluster,
including:

- policy-side model loading/training
- vLLM generation worker support
- compatible container/venv stack
- short GRPO validation
- logprob correctness validation

## Active Upstream Items

| Dependency | Upstream item | Status on 2026-04-24 | Impact | Local action |
| --- | --- | --- | --- | --- |
| Automodel | https://github.com/NVIDIA-NeMo/Automodel/issues/2034 | Open | Blocks native policy-side support for `DeepseekV4ForCausalLM`. | Track registry support, state dict adapter, and first usable commit. |
| vLLM | https://github.com/vllm-project/vllm/pull/40760 | Open PR, not merged | Adds core DeepSeek V4 serving implementation and custom kernels. | Track merge status and any changes to required serving args, kernels, or checkpoint assumptions. |
| vLLM | https://github.com/vllm-project/vllm/issues/40778 | Open | General DSV4 support discussion. Includes hardware failures and Base deployment reports. | Watch for maintainer guidance on supported GPU architectures and recommended commands. |
| vLLM | https://github.com/vllm-project/vllm/issues/40790 | Open | Confirms `DeepSeek-V4-Flash-Base` fails during vLLM MoE weight loading with `_load_w13` tensor shape mismatch. | Use `DeepSeek-V4-Flash` for current vLLM smoke tests; treat Base as blocked until vLLM supports the FP8 expert layout or a converter exists. |

## Watch List

| Dependency | What to watch | Why it matters |
| --- | --- | --- |
| Transformers | Native `deepseek_v4` config/model/tokenizer support | Policy-side loading may depend on exact architecture classes and config parsing. |
| DeepGEMM | SM architecture support for DSV4 kernels | vLLM discussions report unsupported architecture errors outside some supported GPU paths. |
| DeepEP / expert parallel | DSV4 EP runtime behavior | Generation worker stability and throughput may depend on EP/all-to-all behavior. |
| FlashInfer / attention kernels | Sparse or DSV4-specific MLA/indexer kernels | DSV4 serving uses specialized attention/indexer behavior. |

## Current Local Decisions

- Use `DeepSeek-V4-Flash` for vLLM generation validation.
- Do not use `DeepSeek-V4-Flash-Base` for vLLM generation until the Base FP8 checkpoint layout is supported or converted.
- Treat Automodel support as the primary training-side blocker.
- Treat vLLM hardware errors separately from Base checkpoint format errors.

## Known Local Context

Current vLLM Base failure observed locally:

- `DeepSeek-V4-Flash-Base` fails during vLLM MoE weight loading.
- Error class: `_load_w13` tensor shape mismatch.
- This matches upstream vLLM issue https://github.com/vllm-project/vllm/issues/40790.

## Update Protocol

Update this issue when:

- vLLM PR #40760 lands or changes DSV4 runtime assumptions.
- vLLM issues #40778 or #40790 receive actionable maintainer guidance.
- We hit a new upstream dependency blocker in Transformers, DeepGEMM, DeepEP, FlashInfer, or related packages.
- A validated container/image or model path changes the support path.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track DeepSeek V4 upstream dependency blockers #8

DeepSeek V4 Upstream Dependency Tracker

Current Goal

Active Upstream Items

Watch List

Current Local Decisions

Known Local Context

Update Protocol

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Dependency	Upstream item	Status on 2026-04-24	Impact	Local action
Automodel	NVIDIA-NeMo/Automodel#2034	Open	Blocks native policy-side support for `DeepseekV4ForCausalLM`.	Track registry support, state dict adapter, and first usable commit.
vLLM	vllm-project/vllm#40760	Open PR, not merged	Adds core DeepSeek V4 serving implementation and custom kernels.	Track merge status and any changes to required serving args, kernels, or checkpoint assumptions.
vLLM	vllm-project/vllm#40778	Open	General DSV4 support discussion. Includes hardware failures and Base deployment reports.	Watch for maintainer guidance on supported GPU architectures and recommended commands.
vLLM	vllm-project/vllm#40790	Open	Confirms `DeepSeek-V4-Flash-Base` fails during vLLM MoE weight loading with `_load_w13` tensor shape mismatch.	Use `DeepSeek-V4-Flash` for current vLLM smoke tests; treat Base as blocked until vLLM supports the FP8 expert layout or a converter exists.

Dependency	What to watch	Why it matters
Transformers	Native `deepseek_v4` config/model/tokenizer support	Policy-side loading may depend on exact architecture classes and config parsing.
DeepGEMM	SM architecture support for DSV4 kernels	vLLM discussions report unsupported architecture errors outside some supported GPU paths.
DeepEP / expert parallel	DSV4 EP runtime behavior	Generation worker stability and throughput may depend on EP/all-to-all behavior.
FlashInfer / attention kernels	Sparse or DSV4-specific MLA/indexer kernels	DSV4 serving uses specialized attention/indexer behavior.

Track DeepSeek V4 upstream dependency blockers #8

Description

DeepSeek V4 Upstream Dependency Tracker

Current Goal

Active Upstream Items

Watch List

Current Local Decisions

Known Local Context

Update Protocol

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions