[NeMo RL] v0.7.0 Release Roadmap

# NeMo RL v0.7.0 Roadmap

**ETA:** June 30, 2026

This is a community-facing snapshot of the work targeted for the NeMo RL v0.7.0 release.

---

## 1\. Training - AutoModel backend


| Feature | Status | Link |
| -- | -- | -- |
| MiniMax-M2.7 support | WIP | NVIDIA-NeMo/RL#2251 |
| DeepSeek V4 Flash support | WIP | NVIDIA-NeMo/RL#2331 |
| Gemma 4 AutoModel support | WIP | NVIDIA-NeMo/RL#2212, PR NVIDIA-NeMo/RL#2224 |
| Nemotron Nano v3 Omni AutoModel support | WIP | NVIDIA-NeMo/RL#2361, PR [NVIDIA-NeMo/RL#2362](<https://github.com/NVIDIA-NeMo/RL/issues/2362>) |
| Mistral 3.5 AutoModel support | WIP | [https://github.com/NVIDIA-NeMo/RL/issues/2542](<https://github.com/NVIDIA-NeMo/RL/issues/2542>) |

---

## 2\. Training - Megatron backend


| Feature | Status | Link |
| -- | -- | -- |
| GLM 5.1 GRPO support | WIP | NVIDIA-NeMo/RL#2377, PR NVIDIA-NeMo/RL#2489 |
| Kimi K2.6 support | WIP | issue NVIDIA-NeMo/RL#2412 |
| ModelOpt low-precision QAT / quantized checkpoint training | WIP | NVIDIA-NeMo/RL#1099 |
| E2E FP8 / long-context FP8 benchmark | WIP |  |

---

## 3\. Inference - vLLM backend


| Feature | Status | Link |
| -- | -- | -- |
| W4A16 with QAT | WIP |  |
| vLLM 0.19.2 update to match TRT-LLM performance | WIP |  |
| Router Replay Rollouts (R3) | WIP |  |

---

## 4\. Inference - Megatron backend


| Feature | Status | Link |
| -- | -- | -- |
| Numerics and speed features for dense models | WIP |  |

---

## 5\. Algorithm & Dataset


| Feature | Status | Link |
| -- | -- | -- |
| PPO with MCore | WIP | NVIDIA-NeMo/RL#2048, PR NVIDIA-NeMo/RL#2530 |
| PPO with dTensor | WIP | NVIDIA-NeMo/RL#2046, NVIDIA-NeMo/RL#2047, NVIDIA-NeMo/RL#2048, draft PR [NVIDIA-NeMo/RL#2027](<https://github.com/NVIDIA-NeMo/RL/issues/2027>), PR NVIDIA-NeMo/RL#2530 |
| Multi-teacher off-policy distillation | WIP | NVIDIA-NeMo/RL#1700 |
| Cross-tokenizer distillation | WIP | issue NVIDIA-NeMo/RL#1827, PR NVIDIA-NeMo/RL#2508 |
| Hybrid reward with RLVR and reward model | WIP |  |

---

## 6\. General Infra improvement


| Feature | Status | Link |
| -- | -- | -- |
| Control and data plane separation | WIP | NVIDIA-NeMo/RL#2414 |
| Generation trajectory checkpointing | WIP | NVIDIA-NeMo/RL#2415 |
| RDMA refit and delta refit | WIP |  |
| NCCL reshard into NeMo RL | WIP | #2413 |

---

## 7\. Performance improvements


| Feature | Status | Link |
| -- | -- | -- |
| SWE async RL benchmark with Qwen3.5 | WIP |  |
| RL Training Perf features | WIP |  |

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NeMo RL] v0.7.0 Release Roadmap #2591

NeMo RL v0.7.0 Roadmap

1. Training - AutoModel backend

2. Training - Megatron backend

3. Inference - vLLM backend

4. Inference - Megatron backend

5. Algorithm & Dataset

6. General Infra improvement

7. Performance improvements

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature	Status	Link
MiniMax-M2.7 support	WIP	#2251
DeepSeek V4 Flash support	WIP	#2331
Gemma 4 AutoModel support	WIP	#2212, PR #2224
Nemotron Nano v3 Omni AutoModel support	WIP	#2361, PR NVIDIA-NeMo/RL#2362
Mistral 3.5 AutoModel support	WIP	#2542

Feature	Status	Link
GLM 5.1 GRPO support	WIP	#2377, PR #2489
Kimi K2.6 support	WIP	issue #2412
ModelOpt low-precision QAT / quantized checkpoint training	WIP	#1099
E2E FP8 / long-context FP8 benchmark	WIP

Feature	Status	Link
W4A16 with QAT	WIP
vLLM 0.19.2 update to match TRT-LLM performance	WIP
Router Replay Rollouts (R3)	WIP

Feature	Status	Link
PPO with MCore	WIP	#2048, PR #2530
PPO with dTensor	WIP	#2046, #2047, #2048, draft PR NVIDIA-NeMo/RL#2027, PR #2530
Multi-teacher off-policy distillation	WIP	#1700
Cross-tokenizer distillation	WIP	issue #1827, PR #2508
Hybrid reward with RLVR and reward model	WIP

Feature	Status	Link
Control and data plane separation	WIP	#2414
Generation trajectory checkpointing	WIP	#2415
RDMA refit and delta refit	WIP
NCCL reshard into NeMo RL	WIP	#2413

Feature	Status	Link
SWE async RL benchmark with Qwen3.5	WIP
RL Training Perf features	WIP

[NeMo RL] v0.7.0 Release Roadmap #2591

Description

NeMo RL v0.7.0 Roadmap

1. Training - AutoModel backend

2. Training - Megatron backend

3. Inference - vLLM backend

4. Inference - Megatron backend

5. Algorithm & Dataset

6. General Infra improvement

7. Performance improvements

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions