Skip to content

[RL] Turn compile for generator back on#2710

Merged
Lucaskabela merged 1 commit intopytorch:mainfrom
Lucaskabela:lucaskabela/turn_generator_compile_back
Mar 26, 2026
Merged

[RL] Turn compile for generator back on#2710
Lucaskabela merged 1 commit intopytorch:mainfrom
Lucaskabela:lucaskabela/turn_generator_compile_back

Conversation

@Lucaskabela
Copy link
Copy Markdown
Contributor

Summary

We turned the compile for generator off in #2638 due to conflict with DTensor and symbolic propogation

We fix this in pytorch/pytorch#178210 so reenable this config (once landed in nightly)

Test

python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 25, 2026
@Lucaskabela Lucaskabela force-pushed the lucaskabela/turn_generator_compile_back branch from 2dd46ce to 88b46a9 Compare March 25, 2026 16:37
Copy link
Copy Markdown
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to merge after it works

@Lucaskabela Lucaskabela marked this pull request as ready for review March 26, 2026 16:43
@Lucaskabela
Copy link
Copy Markdown
Contributor Author

Confirmed this is working locally with numerics test :) For folks that have issues please ensure your pytorch version has pytorch/pytorch#178210

@Lucaskabela Lucaskabela merged commit d4dfa9e into pytorch:main Mar 26, 2026
21 of 24 checks passed
@Lucaskabela
Copy link
Copy Markdown
Contributor Author

Lucaskabela commented Mar 26, 2026

Local run of the numerics:

torchrun --nproc-per-node=2 torchtitan/experiments/rl/tests/test_attn_numerics.py
at
Loaded HF weights from torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B (311 params)
Trainer attention module: ScaledDotProductAttentionWrapper
Trainer computed 30 token log-probs
  vLLM   log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585]
  Trainer log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585]
============================================================
LOGPROB COMPARISON RESULTS
============================================================
  Bitwise identical : True
  Tokens checked    : 30
  Tokens different  : 0
  Max delta         : 0.000000e+00
  Avg delta         : 0.000000e+00
  Diff mean         : 0.000000e+00
  Diff max          : 0.000000e+00
============================================================
PASS: vLLM and trainer log-probs are bitwise identical.
/home/lucaskabela/.conda/envs/pytorch_build/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

backend="none",
cudagraph_mode="none",
backend="eager",
cudagraph_mode="piecewise",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use the class type instead of str?

@acisseJZhong
Copy link
Copy Markdown
Contributor

emmm curious what setting you are running, on main, I am not seeing LOGPROB COMPARISON RESULTS
matching.

@Lucaskabela
Copy link
Copy Markdown
Contributor Author

emmm curious what setting you are running, on main, I am not seeing LOGPROB COMPARISON RESULTS

Nothing fancy, I just rebuilt all the repos (vLLM, torch, and torchtitan) from main this morning and ran the numerics test; didn't change anything else

wwwjn added a commit that referenced this pull request Mar 26, 2026
Lucaskabela added a commit that referenced this pull request Mar 26, 2026
Reverts #2710

Need to wait another day for nightly to pickup
pytorch/pytorch#178210

Co-authored-by: Jiani Wang <jianiwangw@gmail.com>
pytorch-bot Bot pushed a commit that referenced this pull request Mar 27, 2026
## Summary

We turned the compile for generator off in
#2638 due to conflict with
DTensor and symbolic propogation

We fix this in pytorch/pytorch#178210 so
reenable this config (once landed in nightly)

## Test
```bash
python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B
```
pytorch-bot Bot pushed a commit that referenced this pull request Mar 27, 2026
Reverts #2710

Need to wait another day for nightly to pickup
pytorch/pytorch#178210

Co-authored-by: Jiani Wang <jianiwangw@gmail.com>
Lucaskabela added a commit that referenced this pull request Mar 27, 2026
## Summary
This PR 
1) Reapplies #2710

## Test plan
PREREQ: ensure pytorch/pytorch#178210 is in your
torch version
```bash
python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B
```
With both compiles on, we expect a 4x speedup over eager (timed from
~400s e2e to ~100s for 10 steps)

```bash
torchrun --nproc_per_node=2 \
      torchtitan/experiments/rl/tests/test_bitwise_identity.py
```

For numerics results in:
```
Trainer computed 30 token log-probs
  vLLM   log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585]
  Trainer log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585]
============================================================
LOGPROB COMPARISON RESULTS
============================================================
  Bitwise identical : True
  Tokens checked    : 30
  Tokens different  : 0
  Max delta         : 0.000000e+00
  Avg delta         : 0.000000e+00
  Diff mean         : 0.000000e+00
  Diff max          : 0.000000e+00
============================================================
PASS: vLLM and trainer log-probs are bitwise identical.
/home/lucaskabela/.conda/envs/pytorch_build/lib/python3.10
```
chelsea0x3b pushed a commit to chelsea0x3b/torchtitan that referenced this pull request Mar 30, 2026
## Summary

We turned the compile for generator off in
pytorch#2638 due to conflict with
DTensor and symbolic propogation

We fix this in pytorch/pytorch#178210 so
reenable this config (once landed in nightly)

## Test
```bash
python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B
```
chelsea0x3b pushed a commit to chelsea0x3b/torchtitan that referenced this pull request Mar 30, 2026
Reverts pytorch#2710

Need to wait another day for nightly to pickup
pytorch/pytorch#178210

Co-authored-by: Jiani Wang <jianiwangw@gmail.com>
chelsea0x3b pushed a commit to chelsea0x3b/torchtitan that referenced this pull request Mar 30, 2026
## Summary
This PR 
1) Reapplies pytorch#2710

## Test plan
PREREQ: ensure pytorch/pytorch#178210 is in your
torch version
```bash
python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B
```
With both compiles on, we expect a 4x speedup over eager (timed from
~400s e2e to ~100s for 10 steps)

```bash
torchrun --nproc_per_node=2 \
      torchtitan/experiments/rl/tests/test_bitwise_identity.py
```

For numerics results in:
```
Trainer computed 30 token log-probs
  vLLM   log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585]
  Trainer log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585]
============================================================
LOGPROB COMPARISON RESULTS
============================================================
  Bitwise identical : True
  Tokens checked    : 30
  Tokens different  : 0
  Max delta         : 0.000000e+00
  Avg delta         : 0.000000e+00
  Diff mean         : 0.000000e+00
  Diff max          : 0.000000e+00
============================================================
PASS: vLLM and trainer log-probs are bitwise identical.
/home/lucaskabela/.conda/envs/pytorch_build/lib/python3.10
```
acisseJZhong pushed a commit that referenced this pull request Mar 31, 2026
## Summary

We turned the compile for generator off in
#2638 due to conflict with
DTensor and symbolic propogation

We fix this in pytorch/pytorch#178210 so
reenable this config (once landed in nightly)

## Test
```bash
python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B
```
acisseJZhong pushed a commit that referenced this pull request Mar 31, 2026
Reverts #2710

Need to wait another day for nightly to pickup
pytorch/pytorch#178210

Co-authored-by: Jiani Wang <jianiwangw@gmail.com>
acisseJZhong pushed a commit that referenced this pull request Mar 31, 2026
## Summary
This PR 
1) Reapplies #2710

## Test plan
PREREQ: ensure pytorch/pytorch#178210 is in your
torch version
```bash
python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B
```
With both compiles on, we expect a 4x speedup over eager (timed from
~400s e2e to ~100s for 10 steps)

```bash
torchrun --nproc_per_node=2 \
      torchtitan/experiments/rl/tests/test_bitwise_identity.py
```

For numerics results in:
```
Trainer computed 30 token log-probs
  vLLM   log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585]
  Trainer log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585]
============================================================
LOGPROB COMPARISON RESULTS
============================================================
  Bitwise identical : True
  Tokens checked    : 30
  Tokens different  : 0
  Max delta         : 0.000000e+00
  Avg delta         : 0.000000e+00
  Diff mean         : 0.000000e+00
  Diff max          : 0.000000e+00
============================================================
PASS: vLLM and trainer log-probs are bitwise identical.
/home/lucaskabela/.conda/envs/pytorch_build/lib/python3.10
```
TXacs pushed a commit to McmillanTAC/torchtitan that referenced this pull request Apr 13, 2026
## Summary

We turned the compile for generator off in
pytorch#2638 due to conflict with
DTensor and symbolic propogation

We fix this in pytorch/pytorch#178210 so
reenable this config (once landed in nightly)

## Test
```bash
python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B
```
TXacs pushed a commit to McmillanTAC/torchtitan that referenced this pull request Apr 13, 2026
Reverts pytorch#2710

Need to wait another day for nightly to pickup
pytorch/pytorch#178210

Co-authored-by: Jiani Wang <jianiwangw@gmail.com>
TXacs pushed a commit to McmillanTAC/torchtitan that referenced this pull request Apr 13, 2026
## Summary
This PR 
1) Reapplies pytorch#2710

## Test plan
PREREQ: ensure pytorch/pytorch#178210 is in your
torch version
```bash
python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B
```
With both compiles on, we expect a 4x speedup over eager (timed from
~400s e2e to ~100s for 10 steps)

```bash
torchrun --nproc_per_node=2 \
      torchtitan/experiments/rl/tests/test_bitwise_identity.py
```

For numerics results in:
```
Trainer computed 30 token log-probs
  vLLM   log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585]
  Trainer log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585]
============================================================
LOGPROB COMPARISON RESULTS
============================================================
  Bitwise identical : True
  Tokens checked    : 30
  Tokens different  : 0
  Max delta         : 0.000000e+00
  Avg delta         : 0.000000e+00
  Diff mean         : 0.000000e+00
  Diff max          : 0.000000e+00
============================================================
PASS: vLLM and trainer log-probs are bitwise identical.
/home/lucaskabela/.conda/envs/pytorch_build/lib/python3.10
```
ACharacterInASimulation pushed a commit to ACharacterInASimulation/torchtitan that referenced this pull request Apr 21, 2026
## Summary

We turned the compile for generator off in
pytorch#2638 due to conflict with
DTensor and symbolic propogation

We fix this in pytorch/pytorch#178210 so
reenable this config (once landed in nightly)

## Test
```bash
python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B
```
ACharacterInASimulation pushed a commit to ACharacterInASimulation/torchtitan that referenced this pull request Apr 21, 2026
Reverts pytorch#2710

Need to wait another day for nightly to pickup
pytorch/pytorch#178210

Co-authored-by: Jiani Wang <jianiwangw@gmail.com>
ACharacterInASimulation pushed a commit to ACharacterInASimulation/torchtitan that referenced this pull request Apr 21, 2026
## Summary
This PR 
1) Reapplies pytorch#2710

## Test plan
PREREQ: ensure pytorch/pytorch#178210 is in your
torch version
```bash
python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B
```
With both compiles on, we expect a 4x speedup over eager (timed from
~400s e2e to ~100s for 10 steps)

```bash
torchrun --nproc_per_node=2 \
      torchtitan/experiments/rl/tests/test_bitwise_identity.py
```

For numerics results in:
```
Trainer computed 30 token log-probs
  vLLM   log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585]
  Trainer log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585]
============================================================
LOGPROB COMPARISON RESULTS
============================================================
  Bitwise identical : True
  Tokens checked    : 30
  Tokens different  : 0
  Max delta         : 0.000000e+00
  Avg delta         : 0.000000e+00
  Diff mean         : 0.000000e+00
  Diff max          : 0.000000e+00
============================================================
PASS: vLLM and trainer log-probs are bitwise identical.
/home/lucaskabela/.conda/envs/pytorch_build/lib/python3.10
```
@Lucaskabela Lucaskabela deleted the lucaskabela/turn_generator_compile_back branch May 6, 2026 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/8gpu CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants