Skip to content

[Bugfix] Fix by copying sym shapes as needed#178210

Closed
Lucaskabela wants to merge 1 commit intomainfrom
lucaskabela/fix_symbolic_shape_dtensor_to_local
Closed

[Bugfix] Fix by copying sym shapes as needed#178210
Lucaskabela wants to merge 1 commit intomainfrom
lucaskabela/fix_symbolic_shape_dtensor_to_local

Conversation

@Lucaskabela
Copy link
Copy Markdown
Contributor

@Lucaskabela Lucaskabela commented Mar 23, 2026

Fixes #175690

Summary

We discovered an error in torchtitan related to symbolic sequence length - namely, when doing piecewise cudagraphs with batch size 1 and seqlen 1, and sharding 2, we found in to_local() we incorrectly created an extra symbol that we could not simplify (with size like 2 * (s27 // 2).

To fix this error, we could narrow in user code, however the more proper fix is to do this here into_local() itself - here we can correctly fix the symbolic shape

Test plan

Added 2 unit tests

Authored with claude

cc @ezyang @penguinwu @bobrenjc93 @aditvenk @laithsakka

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Mar 23, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/178210

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 4 New Failures, 3 Unrelated Failures

As of commit f79f967 with merge base be2c9ee (image):

NEW FAILURES - The following jobs have failed:

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot Bot added ciflow/dtensor Run DTensor specific tests ciflow/inductor ciflow/torchtitan Run TorchTitan integration tests release notes: distributed (dtensor) release notes category labels Mar 23, 2026
@Lucaskabela Lucaskabela added module: dynamic shapes ciflow/dynamo Trigger jobs ran periodically on main for dynamo tests and removed ciflow/inductor release notes: distributed (dtensor) release notes category ciflow/torchtitan Run TorchTitan integration tests labels Mar 23, 2026
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Mar 23, 2026

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

2 similar comments
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Mar 23, 2026

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Mar 23, 2026

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@pytorch-bot pytorch-bot Bot added ciflow/inductor ciflow/torchtitan Run TorchTitan integration tests release notes: distributed (dtensor) release notes category labels Mar 23, 2026
@Lucaskabela
Copy link
Copy Markdown
Contributor Author

NOTE: this should fix the issue mentioned in pytorch/torchtitan#2638 and allow us to reenable compile in torch rl

@Lucaskabela Lucaskabela force-pushed the lucaskabela/fix_symbolic_shape_dtensor_to_local branch 5 times, most recently from a59f785 to fd4c7ec Compare March 24, 2026 22:00
@Lucaskabela Lucaskabela changed the title [DRAFT][DO NOT REVIEW] Fix by copying sym shapes as needed [Bugfix] Fix by copying sym shapes as needed Mar 24, 2026
@Lucaskabela Lucaskabela requested review from pianpwk and wconstab March 24, 2026 22:17
@Lucaskabela Lucaskabela marked this pull request as ready for review March 24, 2026 22:17
@Lucaskabela
Copy link
Copy Markdown
Contributor Author

Lucaskabela commented Mar 24, 2026

There was a question from @pianpwk on the issue/ a previous iteration of this PR like "Why does this cause an error?"

The answer is concretely we generate an incorrect graph because we have two symbols representing sequence length (one at a global level, one at a local level), and we cannot assert that s1 == 2 * (s2 // 2), despite the fact that this is true because of how to_local handles padding/unpadding

We validate that in the second test case added narrow does not affect the correctness of this, and is a noop when the sizes are already the same - it is mostly for symbolic propogation

if local_tensor.size(self.dim) is not logical_dim_size:
# torch._check ensures correctness
torch._check(local_tensor.size(self.dim) >= logical_dim_size)
local_tensor = local_tensor.narrow(self.dim, 0, logical_dim_size)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just confirming, this operation should never actually change the real shape of local_tensor right? it is strictly here to add information to the symbolic shapes?

Copy link
Copy Markdown
Contributor Author

@Lucaskabela Lucaskabela Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct - I added an explicit assert saying this

Lucaskabela added a commit to pytorch/torchtitan that referenced this pull request Mar 26, 2026
## Summary

We turned the compile for generator off in
#2638 due to conflict with
DTensor and symbolic propogation

We fix this in pytorch/pytorch#178210 so
reenable this config (once landed in nightly)

## Test
```bash
python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B
```
Lucaskabela added a commit to pytorch/torchtitan that referenced this pull request Mar 26, 2026
Reverts #2710

Need to wait another day for nightly to pickup
pytorch/pytorch#178210

Co-authored-by: Jiani Wang <jianiwangw@gmail.com>
pytorch-bot Bot pushed a commit to pytorch/torchtitan that referenced this pull request Mar 27, 2026
## Summary

We turned the compile for generator off in
#2638 due to conflict with
DTensor and symbolic propogation

We fix this in pytorch/pytorch#178210 so
reenable this config (once landed in nightly)

## Test
```bash
python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B
```
weifengpy pushed a commit to weifengpy/torchtitan that referenced this pull request Mar 27, 2026
Reverts pytorch#2710

Need to wait another day for nightly to pickup
pytorch/pytorch#178210

Co-authored-by: Jiani Wang <jianiwangw@gmail.com>
Copilot AI pushed a commit that referenced this pull request Mar 27, 2026
Fixes #175690

## Summary

We discovered an error in torchtitan related to symbolic sequence length - namely, when doing piecewise cudagraphs with batch size 1 and seqlen 1, and sharding 2, we found in `to_local()` we incorrectly created an extra symbol that we could not simplify (with size like `2 * (s27 // 2)`.

To fix this error, we could narrow in user code, however the more proper fix is to do this here in`to_local()` itself - here we can correctly fix the symbolic shape

## Test plan

Added 2 unit tests

Authored with claude

Pull Request resolved: #178210
Approved by: https://github.com/wconstab

Co-authored-by: Xia-Weiwen <12522207+Xia-Weiwen@users.noreply.github.com>
fmassa added a commit to meta-pytorch/autoparallel that referenced this pull request Mar 27, 2026
…nfigurations in example_dcp

When reconstructing sharding placements from the serialized map for a different mesh, we were only updating .mesh on each DTensorSpec but leaving tensor_meta unchanged. Since the batch size scales with the mesh (bs = 8 * mesh.shape[0]), the
tensor shapes differ between the two phases (256 vs 16 on dim 0), so the specs carried stale shapes from the original configuration.

This was previously harmless — PyTorch's _maybe_unpad_tensor only used logical_dim_size (derived from tensor_meta.shape) to compute the unpad amount, and with even sharding that was a no-op. PyTorch PR
pytorch/pytorch#178210 added a torch._check(orig_size >= logical_dim_size) assertion that now catches the mismatch.

Fix: pull correct tensor_meta from the new AutoParallel instance's computed strategies, which reflect the actual shapes for the current mesh and batch size.

Authored with Claude.
fmassa added a commit to meta-pytorch/autoparallel that referenced this pull request Mar 27, 2026
* Work around upstream PyTorch view op assertion in propagation rules

PyTorch PR pytorch/pytorch#177973 fixed a latent bug where a comparison shard.dim == in_dim was always False (comparing int to InputDim), making the double-shard submesh_size validation in propagate_shape_and_sharding
dead code. The fix activated that validation, which now raises an AssertionError when a sharded dimension is not divisible by the mesh size (e.g. unflatten nheads=48 on mesh dim size 32).

This is overly strict for our use case — we call propagate_shape_and_sharding with strict_view=False to enumerate all possible sharding strategies, and incompatible ones should be silently skipped rather than raising. We now catch the
AssertionError and skip the iteration, since the replicated variant is already covered by another strategy.

Authored with Claude.

* Fix compatibility with latest PyTorch nightly by replacing removed pointwise_strategy

PyTorch removed pointwise_strategy from torch.distributed.tensor._ops._pointwise_ops in pytorch/pytorch#177208 as part of a dead code cleanup. Our custom native_layer_norm and native_layer_norm_backward rules imported
this function, causing an ImportError on PyTorch nightlies after 2025-03-23.

This replaces the import with a local _pointwise_strategy that recomposes the same behavior from PyTorch's new single-dim strategy primitives (_common_pointwise_single_dim_strategy, _fill_single_dim_strategy_placeholders,
expand_to_full_mesh_op_strategy).

Authored with Claude.

* Bugfix

The issue was that _get_unique_placements from PyTorch's single-dim strategy path asserts each OpStrategy has exactly 1 strategy — but in autoparallel, input OpStrategy objects have multiple strategies (all possible placements). Replaced it
with an inline loop that collects unique placements from all strategies across all inputs.

* One more fix

The fix: _common_pointwise_single_dim_strategy produces num_outputs output entries per strategy row (e.g., 3 for native_layer_norm's out/mean/rstd). Since all pointwise outputs share the same placement, we collapse to a single output entry
before passing to expand_to_full_mesh_op_strategy with input_index=1. This way strategy.output_specs is a single DTensorSpec, matching the old pointwise_strategy behavior — the callers then construct the per-output specs (out, mean, rstd)
themselves.

* Fix stale tensor_meta when reusing sharding placements across mesh configurations in example_dcp

When reconstructing sharding placements from the serialized map for a different mesh, we were only updating .mesh on each DTensorSpec but leaving tensor_meta unchanged. Since the batch size scales with the mesh (bs = 8 * mesh.shape[0]), the
tensor shapes differ between the two phases (256 vs 16 on dim 0), so the specs carried stale shapes from the original configuration.

This was previously harmless — PyTorch's _maybe_unpad_tensor only used logical_dim_size (derived from tensor_meta.shape) to compute the unpad amount, and with even sharding that was a no-op. PyTorch PR
pytorch/pytorch#178210 added a torch._check(orig_size >= logical_dim_size) assertion that now catches the mismatch.

Fix: pull correct tensor_meta from the new AutoParallel instance's computed strategies, which reflect the actual shapes for the current mesh and batch size.

Authored with Claude.
Lucaskabela added a commit to pytorch/torchtitan that referenced this pull request Mar 27, 2026
## Summary
This PR 
1) Reapplies #2710

## Test plan
PREREQ: ensure pytorch/pytorch#178210 is in your
torch version
```bash
python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B
```
With both compiles on, we expect a 4x speedup over eager (timed from
~400s e2e to ~100s for 10 steps)

```bash
torchrun --nproc_per_node=2 \
      torchtitan/experiments/rl/tests/test_bitwise_identity.py
```

For numerics results in:
```
Trainer computed 30 token log-probs
  vLLM   log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585]
  Trainer log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585]
============================================================
LOGPROB COMPARISON RESULTS
============================================================
  Bitwise identical : True
  Tokens checked    : 30
  Tokens different  : 0
  Max delta         : 0.000000e+00
  Avg delta         : 0.000000e+00
  Diff mean         : 0.000000e+00
  Diff max          : 0.000000e+00
============================================================
PASS: vLLM and trainer log-probs are bitwise identical.
/home/lucaskabela/.conda/envs/pytorch_build/lib/python3.10
```
chelsea0x3b pushed a commit to chelsea0x3b/torchtitan that referenced this pull request Mar 30, 2026
## Summary

We turned the compile for generator off in
pytorch#2638 due to conflict with
DTensor and symbolic propogation

We fix this in pytorch/pytorch#178210 so
reenable this config (once landed in nightly)

## Test
```bash
python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B
```
chelsea0x3b pushed a commit to chelsea0x3b/torchtitan that referenced this pull request Mar 30, 2026
Reverts pytorch#2710

Need to wait another day for nightly to pickup
pytorch/pytorch#178210

Co-authored-by: Jiani Wang <jianiwangw@gmail.com>
chelsea0x3b pushed a commit to chelsea0x3b/torchtitan that referenced this pull request Mar 30, 2026
## Summary
This PR 
1) Reapplies pytorch#2710

## Test plan
PREREQ: ensure pytorch/pytorch#178210 is in your
torch version
```bash
python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B
```
With both compiles on, we expect a 4x speedup over eager (timed from
~400s e2e to ~100s for 10 steps)

```bash
torchrun --nproc_per_node=2 \
      torchtitan/experiments/rl/tests/test_bitwise_identity.py
```

For numerics results in:
```
Trainer computed 30 token log-probs
  vLLM   log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585]
  Trainer log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585]
============================================================
LOGPROB COMPARISON RESULTS
============================================================
  Bitwise identical : True
  Tokens checked    : 30
  Tokens different  : 0
  Max delta         : 0.000000e+00
  Avg delta         : 0.000000e+00
  Diff mean         : 0.000000e+00
  Diff max          : 0.000000e+00
============================================================
PASS: vLLM and trainer log-probs are bitwise identical.
/home/lucaskabela/.conda/envs/pytorch_build/lib/python3.10
```
AaronWang04 pushed a commit to AaronWang04/pytorch that referenced this pull request Mar 31, 2026
Fixes pytorch#175690

## Summary

We discovered an error in torchtitan related to symbolic sequence length - namely, when doing piecewise cudagraphs with batch size 1 and seqlen 1, and sharding 2, we found in `to_local()` we incorrectly created an extra symbol that we could not simplify (with size like `2 * (s27 // 2)`.

To fix this error, we could narrow in user code, however the more proper fix is to do this here in`to_local()` itself - here we can correctly fix the symbolic shape

## Test plan

Added 2 unit tests

Authored with claude

Pull Request resolved: pytorch#178210
Approved by: https://github.com/wconstab
acisseJZhong pushed a commit to pytorch/torchtitan that referenced this pull request Mar 31, 2026
## Summary

We turned the compile for generator off in
#2638 due to conflict with
DTensor and symbolic propogation

We fix this in pytorch/pytorch#178210 so
reenable this config (once landed in nightly)

## Test
```bash
python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B
```
acisseJZhong pushed a commit to pytorch/torchtitan that referenced this pull request Mar 31, 2026
Reverts #2710

Need to wait another day for nightly to pickup
pytorch/pytorch#178210

Co-authored-by: Jiani Wang <jianiwangw@gmail.com>
acisseJZhong pushed a commit to pytorch/torchtitan that referenced this pull request Mar 31, 2026
## Summary
This PR 
1) Reapplies #2710

## Test plan
PREREQ: ensure pytorch/pytorch#178210 is in your
torch version
```bash
python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B
```
With both compiles on, we expect a 4x speedup over eager (timed from
~400s e2e to ~100s for 10 steps)

```bash
torchrun --nproc_per_node=2 \
      torchtitan/experiments/rl/tests/test_bitwise_identity.py
```

For numerics results in:
```
Trainer computed 30 token log-probs
  vLLM   log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585]
  Trainer log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585]
============================================================
LOGPROB COMPARISON RESULTS
============================================================
  Bitwise identical : True
  Tokens checked    : 30
  Tokens different  : 0
  Max delta         : 0.000000e+00
  Avg delta         : 0.000000e+00
  Diff mean         : 0.000000e+00
  Diff max          : 0.000000e+00
============================================================
PASS: vLLM and trainer log-probs are bitwise identical.
/home/lucaskabela/.conda/envs/pytorch_build/lib/python3.10
```
pytorch-bot Bot pushed a commit that referenced this pull request Apr 2, 2026
Fixes #175690

## Summary

We discovered an error in torchtitan related to symbolic sequence length - namely, when doing piecewise cudagraphs with batch size 1 and seqlen 1, and sharding 2, we found in `to_local()` we incorrectly created an extra symbol that we could not simplify (with size like `2 * (s27 // 2)`.

To fix this error, we could narrow in user code, however the more proper fix is to do this here in`to_local()` itself - here we can correctly fix the symbolic shape

## Test plan

Added 2 unit tests

Authored with claude

Pull Request resolved: #178210
Approved by: https://github.com/wconstab
nklshy-aws pushed a commit to nklshy-aws/pytorch that referenced this pull request Apr 7, 2026
Fixes pytorch#175690

## Summary

We discovered an error in torchtitan related to symbolic sequence length - namely, when doing piecewise cudagraphs with batch size 1 and seqlen 1, and sharding 2, we found in `to_local()` we incorrectly created an extra symbol that we could not simplify (with size like `2 * (s27 // 2)`.

To fix this error, we could narrow in user code, however the more proper fix is to do this here in`to_local()` itself - here we can correctly fix the symbolic shape

## Test plan

Added 2 unit tests

Authored with claude

Pull Request resolved: pytorch#178210
Approved by: https://github.com/wconstab
TXacs pushed a commit to McmillanTAC/torchtitan that referenced this pull request Apr 13, 2026
## Summary

We turned the compile for generator off in
pytorch#2638 due to conflict with
DTensor and symbolic propogation

We fix this in pytorch/pytorch#178210 so
reenable this config (once landed in nightly)

## Test
```bash
python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B
```
TXacs pushed a commit to McmillanTAC/torchtitan that referenced this pull request Apr 13, 2026
Reverts pytorch#2710

Need to wait another day for nightly to pickup
pytorch/pytorch#178210

Co-authored-by: Jiani Wang <jianiwangw@gmail.com>
TXacs pushed a commit to McmillanTAC/torchtitan that referenced this pull request Apr 13, 2026
## Summary
This PR 
1) Reapplies pytorch#2710

## Test plan
PREREQ: ensure pytorch/pytorch#178210 is in your
torch version
```bash
python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B
```
With both compiles on, we expect a 4x speedup over eager (timed from
~400s e2e to ~100s for 10 steps)

```bash
torchrun --nproc_per_node=2 \
      torchtitan/experiments/rl/tests/test_bitwise_identity.py
```

For numerics results in:
```
Trainer computed 30 token log-probs
  vLLM   log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585]
  Trainer log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585]
============================================================
LOGPROB COMPARISON RESULTS
============================================================
  Bitwise identical : True
  Tokens checked    : 30
  Tokens different  : 0
  Max delta         : 0.000000e+00
  Avg delta         : 0.000000e+00
  Diff mean         : 0.000000e+00
  Diff max          : 0.000000e+00
============================================================
PASS: vLLM and trainer log-probs are bitwise identical.
/home/lucaskabela/.conda/envs/pytorch_build/lib/python3.10
```
ACharacterInASimulation pushed a commit to ACharacterInASimulation/torchtitan that referenced this pull request Apr 21, 2026
## Summary

We turned the compile for generator off in
pytorch#2638 due to conflict with
DTensor and symbolic propogation

We fix this in pytorch/pytorch#178210 so
reenable this config (once landed in nightly)

## Test
```bash
python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B
```
ACharacterInASimulation pushed a commit to ACharacterInASimulation/torchtitan that referenced this pull request Apr 21, 2026
Reverts pytorch#2710

Need to wait another day for nightly to pickup
pytorch/pytorch#178210

Co-authored-by: Jiani Wang <jianiwangw@gmail.com>
ACharacterInASimulation pushed a commit to ACharacterInASimulation/torchtitan that referenced this pull request Apr 21, 2026
## Summary
This PR 
1) Reapplies pytorch#2710

## Test plan
PREREQ: ensure pytorch/pytorch#178210 is in your
torch version
```bash
python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B
```
With both compiles on, we expect a 4x speedup over eager (timed from
~400s e2e to ~100s for 10 steps)

```bash
torchrun --nproc_per_node=2 \
      torchtitan/experiments/rl/tests/test_bitwise_identity.py
```

For numerics results in:
```
Trainer computed 30 token log-probs
  vLLM   log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585]
  Trainer log-probs[:5]: [-4.129570484161377, -1.795021891593933, -0.71578049659729, -0.2110116183757782, -0.9374725222587585]
============================================================
LOGPROB COMPARISON RESULTS
============================================================
  Bitwise identical : True
  Tokens checked    : 30
  Tokens different  : 0
  Max delta         : 0.000000e+00
  Avg delta         : 0.000000e+00
  Diff mean         : 0.000000e+00
  Diff max          : 0.000000e+00
============================================================
PASS: vLLM and trainer log-probs are bitwise identical.
/home/lucaskabela/.conda/envs/pytorch_build/lib/python3.10
```
@github-actions github-actions Bot deleted the lucaskabela/fix_symbolic_shape_dtensor_to_local branch April 25, 2026 02:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/dtensor Run DTensor specific tests ciflow/dynamo Trigger jobs ran periodically on main for dynamo tests ciflow/inductor ciflow/torchtitan Run TorchTitan integration tests ciflow/trunk Trigger trunk jobs on your pull request Merged module: dynamic shapes release notes: distributed (dtensor) release notes category topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dtensor Shard->Replicate redistribution corrupts symbolic shapes under torch.compile

3 participants