[RL] Turn compile for generator back on#2710
Merged
Lucaskabela merged 1 commit intopytorch:mainfrom Mar 26, 2026
Merged
Conversation
2dd46ce to
88b46a9
Compare
tianyu-l
approved these changes
Mar 25, 2026
Contributor
tianyu-l
left a comment
There was a problem hiding this comment.
feel free to merge after it works
Contributor
Author
|
Confirmed this is working locally with numerics test :) For folks that have issues please ensure your pytorch version has pytorch/pytorch#178210 |
Contributor
Author
|
Local run of the numerics: torchrun --nproc-per-node=2 torchtitan/experiments/rl/tests/test_attn_numerics.py |
| backend="none", | ||
| cudagraph_mode="none", | ||
| backend="eager", | ||
| cudagraph_mode="piecewise", |
Contributor
There was a problem hiding this comment.
nit: use the class type instead of str?
Contributor
|
emmm curious what setting you are running, on main, I am not seeing LOGPROB COMPARISON RESULTS |
Contributor
Author
Nothing fancy, I just rebuilt all the repos (vLLM, torch, and torchtitan) from main this morning and ran the numerics test; didn't change anything else |
Lucaskabela
added a commit
that referenced
this pull request
Mar 26, 2026
Reverts #2710 Need to wait another day for nightly to pickup pytorch/pytorch#178210 Co-authored-by: Jiani Wang <jianiwangw@gmail.com>
pytorch-bot Bot
pushed a commit
that referenced
this pull request
Mar 27, 2026
## Summary We turned the compile for generator off in #2638 due to conflict with DTensor and symbolic propogation We fix this in pytorch/pytorch#178210 so reenable this config (once landed in nightly) ## Test ```bash python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B ```
pytorch-bot Bot
pushed a commit
that referenced
this pull request
Mar 27, 2026
Reverts #2710 Need to wait another day for nightly to pickup pytorch/pytorch#178210 Co-authored-by: Jiani Wang <jianiwangw@gmail.com>
Lucaskabela
added a commit
that referenced
this pull request
Mar 27, 2026
## Summary This PR 1) Reapplies #2710 ## Test plan PREREQ: ensure pytorch/pytorch#178210 is in your torch version ```bash python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B ``` With both compiles on, we expect a 4x speedup over eager (timed from ~400s e2e to ~100s for 10 steps) ```bash torchrun --nproc_per_node=2 \ torchtitan/experiments/rl/tests/test_bitwise_identity.py ``` For numerics results in: ``` Trainer computed 30 token log-probs vLLM log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585] Trainer log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585] ============================================================ LOGPROB COMPARISON RESULTS ============================================================ Bitwise identical : True Tokens checked : 30 Tokens different : 0 Max delta : 0.000000e+00 Avg delta : 0.000000e+00 Diff mean : 0.000000e+00 Diff max : 0.000000e+00 ============================================================ PASS: vLLM and trainer log-probs are bitwise identical. /home/lucaskabela/.conda/envs/pytorch_build/lib/python3.10 ```
chelsea0x3b
pushed a commit
to chelsea0x3b/torchtitan
that referenced
this pull request
Mar 30, 2026
## Summary We turned the compile for generator off in pytorch#2638 due to conflict with DTensor and symbolic propogation We fix this in pytorch/pytorch#178210 so reenable this config (once landed in nightly) ## Test ```bash python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B ```
chelsea0x3b
pushed a commit
to chelsea0x3b/torchtitan
that referenced
this pull request
Mar 30, 2026
Reverts pytorch#2710 Need to wait another day for nightly to pickup pytorch/pytorch#178210 Co-authored-by: Jiani Wang <jianiwangw@gmail.com>
chelsea0x3b
pushed a commit
to chelsea0x3b/torchtitan
that referenced
this pull request
Mar 30, 2026
## Summary This PR 1) Reapplies pytorch#2710 ## Test plan PREREQ: ensure pytorch/pytorch#178210 is in your torch version ```bash python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B ``` With both compiles on, we expect a 4x speedup over eager (timed from ~400s e2e to ~100s for 10 steps) ```bash torchrun --nproc_per_node=2 \ torchtitan/experiments/rl/tests/test_bitwise_identity.py ``` For numerics results in: ``` Trainer computed 30 token log-probs vLLM log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585] Trainer log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585] ============================================================ LOGPROB COMPARISON RESULTS ============================================================ Bitwise identical : True Tokens checked : 30 Tokens different : 0 Max delta : 0.000000e+00 Avg delta : 0.000000e+00 Diff mean : 0.000000e+00 Diff max : 0.000000e+00 ============================================================ PASS: vLLM and trainer log-probs are bitwise identical. /home/lucaskabela/.conda/envs/pytorch_build/lib/python3.10 ```
acisseJZhong
pushed a commit
that referenced
this pull request
Mar 31, 2026
## Summary We turned the compile for generator off in #2638 due to conflict with DTensor and symbolic propogation We fix this in pytorch/pytorch#178210 so reenable this config (once landed in nightly) ## Test ```bash python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B ```
acisseJZhong
pushed a commit
that referenced
this pull request
Mar 31, 2026
Reverts #2710 Need to wait another day for nightly to pickup pytorch/pytorch#178210 Co-authored-by: Jiani Wang <jianiwangw@gmail.com>
acisseJZhong
pushed a commit
that referenced
this pull request
Mar 31, 2026
## Summary This PR 1) Reapplies #2710 ## Test plan PREREQ: ensure pytorch/pytorch#178210 is in your torch version ```bash python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B ``` With both compiles on, we expect a 4x speedup over eager (timed from ~400s e2e to ~100s for 10 steps) ```bash torchrun --nproc_per_node=2 \ torchtitan/experiments/rl/tests/test_bitwise_identity.py ``` For numerics results in: ``` Trainer computed 30 token log-probs vLLM log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585] Trainer log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585] ============================================================ LOGPROB COMPARISON RESULTS ============================================================ Bitwise identical : True Tokens checked : 30 Tokens different : 0 Max delta : 0.000000e+00 Avg delta : 0.000000e+00 Diff mean : 0.000000e+00 Diff max : 0.000000e+00 ============================================================ PASS: vLLM and trainer log-probs are bitwise identical. /home/lucaskabela/.conda/envs/pytorch_build/lib/python3.10 ```
TXacs
pushed a commit
to McmillanTAC/torchtitan
that referenced
this pull request
Apr 13, 2026
## Summary We turned the compile for generator off in pytorch#2638 due to conflict with DTensor and symbolic propogation We fix this in pytorch/pytorch#178210 so reenable this config (once landed in nightly) ## Test ```bash python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B ```
TXacs
pushed a commit
to McmillanTAC/torchtitan
that referenced
this pull request
Apr 13, 2026
Reverts pytorch#2710 Need to wait another day for nightly to pickup pytorch/pytorch#178210 Co-authored-by: Jiani Wang <jianiwangw@gmail.com>
TXacs
pushed a commit
to McmillanTAC/torchtitan
that referenced
this pull request
Apr 13, 2026
## Summary This PR 1) Reapplies pytorch#2710 ## Test plan PREREQ: ensure pytorch/pytorch#178210 is in your torch version ```bash python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B ``` With both compiles on, we expect a 4x speedup over eager (timed from ~400s e2e to ~100s for 10 steps) ```bash torchrun --nproc_per_node=2 \ torchtitan/experiments/rl/tests/test_bitwise_identity.py ``` For numerics results in: ``` Trainer computed 30 token log-probs vLLM log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585] Trainer log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585] ============================================================ LOGPROB COMPARISON RESULTS ============================================================ Bitwise identical : True Tokens checked : 30 Tokens different : 0 Max delta : 0.000000e+00 Avg delta : 0.000000e+00 Diff mean : 0.000000e+00 Diff max : 0.000000e+00 ============================================================ PASS: vLLM and trainer log-probs are bitwise identical. /home/lucaskabela/.conda/envs/pytorch_build/lib/python3.10 ```
ACharacterInASimulation
pushed a commit
to ACharacterInASimulation/torchtitan
that referenced
this pull request
Apr 21, 2026
## Summary We turned the compile for generator off in pytorch#2638 due to conflict with DTensor and symbolic propogation We fix this in pytorch/pytorch#178210 so reenable this config (once landed in nightly) ## Test ```bash python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B ```
ACharacterInASimulation
pushed a commit
to ACharacterInASimulation/torchtitan
that referenced
this pull request
Apr 21, 2026
Reverts pytorch#2710 Need to wait another day for nightly to pickup pytorch/pytorch#178210 Co-authored-by: Jiani Wang <jianiwangw@gmail.com>
ACharacterInASimulation
pushed a commit
to ACharacterInASimulation/torchtitan
that referenced
this pull request
Apr 21, 2026
## Summary This PR 1) Reapplies pytorch#2710 ## Test plan PREREQ: ensure pytorch/pytorch#178210 is in your torch version ```bash python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B ``` With both compiles on, we expect a 4x speedup over eager (timed from ~400s e2e to ~100s for 10 steps) ```bash torchrun --nproc_per_node=2 \ torchtitan/experiments/rl/tests/test_bitwise_identity.py ``` For numerics results in: ``` Trainer computed 30 token log-probs vLLM log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585] Trainer log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585] ============================================================ LOGPROB COMPARISON RESULTS ============================================================ Bitwise identical : True Tokens checked : 30 Tokens different : 0 Max delta : 0.000000e+00 Avg delta : 0.000000e+00 Diff mean : 0.000000e+00 Diff max : 0.000000e+00 ============================================================ PASS: vLLM and trainer log-probs are bitwise identical. /home/lucaskabela/.conda/envs/pytorch_build/lib/python3.10 ```
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
We turned the compile for generator off in #2638 due to conflict with DTensor and symbolic propogation
We fix this in pytorch/pytorch#178210 so reenable this config (once landed in nightly)
Test