[RL] Turn compile for generator back on by Lucaskabela · Pull Request #2710 · pytorch/torchtitan

Lucaskabela · 2026-03-25T15:30:14Z

Summary

We turned the compile for generator off in #2638 due to conflict with DTensor and symbolic propogation

We fix this in pytorch/pytorch#178210 so reenable this config (once landed in nightly)

Test

python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B

tianyu-l

feel free to merge after it works

Lucaskabela · 2026-03-26T16:44:33Z

Confirmed this is working locally with numerics test :) For folks that have issues please ensure your pytorch version has pytorch/pytorch#178210

Lucaskabela · 2026-03-26T16:45:28Z

Local run of the numerics:

torchrun --nproc-per-node=2 torchtitan/experiments/rl/tests/test_attn_numerics.py

at
Loaded HF weights from torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B (311 params)
Trainer attention module: ScaledDotProductAttentionWrapper
Trainer computed 30 token log-probs
  vLLM   log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585]
  Trainer log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585]
============================================================
LOGPROB COMPARISON RESULTS
============================================================
  Bitwise identical : True
  Tokens checked    : 30
  Tokens different  : 0
  Max delta         : 0.000000e+00
  Avg delta         : 0.000000e+00
  Diff mean         : 0.000000e+00
  Diff max          : 0.000000e+00
============================================================
PASS: vLLM and trainer log-probs are bitwise identical.
/home/lucaskabela/.conda/envs/pytorch_build/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

acisseJZhong · 2026-03-26T16:51:58Z

-                backend="none",
-                cudagraph_mode="none",
+                backend="eager",
+                cudagraph_mode="piecewise",


nit: use the class type instead of str?

acisseJZhong · 2026-03-26T16:53:00Z

emmm curious what setting you are running, on main, I am not seeing LOGPROB COMPARISON RESULTS
matching.

Lucaskabela · 2026-03-26T17:00:09Z

emmm curious what setting you are running, on main, I am not seeing LOGPROB COMPARISON RESULTS

Nothing fancy, I just rebuilt all the repos (vLLM, torch, and torchtitan) from main this morning and ran the numerics test; didn't change anything else

This reverts commit d4dfa9e.

Reverts #2710 Need to wait another day for nightly to pickup pytorch/pytorch#178210 Co-authored-by: Jiani Wang <jianiwangw@gmail.com>

## Summary We turned the compile for generator off in #2638 due to conflict with DTensor and symbolic propogation We fix this in pytorch/pytorch#178210 so reenable this config (once landed in nightly) ## Test ```bash python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B ```

Reverts #2710 Need to wait another day for nightly to pickup pytorch/pytorch#178210 Co-authored-by: Jiani Wang <jianiwangw@gmail.com>

## Summary This PR 1) Reapplies #2710 ## Test plan PREREQ: ensure pytorch/pytorch#178210 is in your torch version ```bash python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B ``` With both compiles on, we expect a 4x speedup over eager (timed from ~400s e2e to ~100s for 10 steps) ```bash torchrun --nproc_per_node=2 \ torchtitan/experiments/rl/tests/test_bitwise_identity.py ``` For numerics results in: ``` Trainer computed 30 token log-probs vLLM log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585] Trainer log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585] ============================================================ LOGPROB COMPARISON RESULTS ============================================================ Bitwise identical : True Tokens checked : 30 Tokens different : 0 Max delta : 0.000000e+00 Avg delta : 0.000000e+00 Diff mean : 0.000000e+00 Diff max : 0.000000e+00 ============================================================ PASS: vLLM and trainer log-probs are bitwise identical. /home/lucaskabela/.conda/envs/pytorch_build/lib/python3.10 ```

## Summary We turned the compile for generator off in pytorch#2638 due to conflict with DTensor and symbolic propogation We fix this in pytorch/pytorch#178210 so reenable this config (once landed in nightly) ## Test ```bash python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B ```

Reverts pytorch#2710 Need to wait another day for nightly to pickup pytorch/pytorch#178210 Co-authored-by: Jiani Wang <jianiwangw@gmail.com>

## Summary This PR 1) Reapplies pytorch#2710 ## Test plan PREREQ: ensure pytorch/pytorch#178210 is in your torch version ```bash python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B ``` With both compiles on, we expect a 4x speedup over eager (timed from ~400s e2e to ~100s for 10 steps) ```bash torchrun --nproc_per_node=2 \ torchtitan/experiments/rl/tests/test_bitwise_identity.py ``` For numerics results in: ``` Trainer computed 30 token log-probs vLLM log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585] Trainer log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585] ============================================================ LOGPROB COMPARISON RESULTS ============================================================ Bitwise identical : True Tokens checked : 30 Tokens different : 0 Max delta : 0.000000e+00 Avg delta : 0.000000e+00 Diff mean : 0.000000e+00 Diff max : 0.000000e+00 ============================================================ PASS: vLLM and trainer log-probs are bitwise identical. /home/lucaskabela/.conda/envs/pytorch_build/lib/python3.10 ```