Skip to content

Make run_dtensor_rng_op compatible with compile_on_one_rank#177447

Closed
aorenste wants to merge 9 commits intogh/aorenste/220/basefrom
gh/aorenste/220/head
Closed

Make run_dtensor_rng_op compatible with compile_on_one_rank#177447
aorenste wants to merge 9 commits intogh/aorenste/220/basefrom
gh/aorenste/220/head

Conversation

@aorenste
Copy link
Copy Markdown
Contributor

@aorenste aorenste commented Mar 14, 2026

Stack from ghstack (oldest at bottom):

Use mesh._sym_get_coordinate() in _compute_rng_offsets so that RNG
offset values become symbolic SymInts (via _runtime_compute_coordinate_on_dim)
when compile_on_one_rank is active. Previously, mesh.get_coordinate()
returned concrete rank-specific integers that got baked into the compiled
graph, producing different graphs on different ranks.

Also refactors test_compile_on_one_rank.py to extract graph-comparison
helpers (_assert_graphs_identical_across_ranks, _compile_and_capture_graph)
and adds a test for DTensor random op graph consistency.

Authored with Claude.

Use mesh._sym_get_coordinate() in _compute_rng_offsets so that RNG
offset values become symbolic SymInts (via _runtime_compute_coordinate_on_dim)
when compile_on_one_rank is active. Previously, mesh.get_coordinate()
returned concrete rank-specific integers that got baked into the compiled
graph, producing different graphs on different ranks.

Also refactors test_compile_on_one_rank.py to extract graph-comparison
helpers (_assert_graphs_identical_across_ranks, _compile_and_capture_graph)
and adds a test for DTensor random op graph consistency.

Authored with Claude.

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 14, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/177447

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Pending, 2 Unrelated Failures

As of commit 1c8fbdc with merge base 417a890 (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/inductor ciflow/torchtitan Run TorchTitan integration tests release notes: distributed (dtensor) release notes category labels Mar 14, 2026
aorenste added a commit that referenced this pull request Mar 14, 2026
Use mesh._sym_get_coordinate() in _compute_rng_offsets so that RNG
offset values become symbolic SymInts (via _runtime_compute_coordinate_on_dim)
when compile_on_one_rank is active. Previously, mesh.get_coordinate()
returned concrete rank-specific integers that got baked into the compiled
graph, producing different graphs on different ranks.

Also refactors test_compile_on_one_rank.py to extract graph-comparison
helpers (_assert_graphs_identical_across_ranks, _compile_and_capture_graph)
and adds a test for DTensor random op graph consistency.

Authored with Claude.

ghstack-source-id: 71bd32b
Pull Request resolved: #177447
Use mesh._sym_get_coordinate() in _compute_rng_offsets so that RNG
offset values become symbolic SymInts (via _runtime_compute_coordinate_on_dim)
when compile_on_one_rank is active. Previously, mesh.get_coordinate()
returned concrete rank-specific integers that got baked into the compiled
graph, producing different graphs on different ranks.

Also refactors test_compile_on_one_rank.py to extract graph-comparison
helpers (_assert_graphs_identical_across_ranks, _compile_and_capture_graph)
and adds a test for DTensor random op graph consistency.

Authored with Claude.

[ghstack-poisoned]
@pytorch-bot pytorch-bot bot added the ciflow/dtensor Run DTensor specific tests label Mar 16, 2026
aorenste added a commit that referenced this pull request Mar 16, 2026
Use mesh._sym_get_coordinate() in _compute_rng_offsets so that RNG
offset values become symbolic SymInts (via _runtime_compute_coordinate_on_dim)
when compile_on_one_rank is active. Previously, mesh.get_coordinate()
returned concrete rank-specific integers that got baked into the compiled
graph, producing different graphs on different ranks.

Also refactors test_compile_on_one_rank.py to extract graph-comparison
helpers (_assert_graphs_identical_across_ranks, _compile_and_capture_graph)
and adds a test for DTensor random op graph consistency.

Authored with Claude.

ghstack-source-id: 7a7bbae
Pull Request resolved: #177447
Use mesh._sym_get_coordinate() in _compute_rng_offsets so that RNG
offset values become symbolic SymInts (via _runtime_compute_coordinate_on_dim)
when compile_on_one_rank is active. Previously, mesh.get_coordinate()
returned concrete rank-specific integers that got baked into the compiled
graph, producing different graphs on different ranks.

Also refactors test_compile_on_one_rank.py to extract graph-comparison
helpers (_assert_graphs_identical_across_ranks, _compile_and_capture_graph)
and adds a test for DTensor random op graph consistency.

Authored with Claude.

[ghstack-poisoned]
aorenste added a commit that referenced this pull request Mar 16, 2026
Use mesh._sym_get_coordinate() in _compute_rng_offsets so that RNG
offset values become symbolic SymInts (via _runtime_compute_coordinate_on_dim)
when compile_on_one_rank is active. Previously, mesh.get_coordinate()
returned concrete rank-specific integers that got baked into the compiled
graph, producing different graphs on different ranks.

Also refactors test_compile_on_one_rank.py to extract graph-comparison
helpers (_assert_graphs_identical_across_ranks, _compile_and_capture_graph)
and adds a test for DTensor random op graph consistency.

Authored with Claude.

ghstack-source-id: 0f5f81d
Pull Request resolved: #177447
Use mesh._sym_get_coordinate() in _compute_rng_offsets so that RNG
offset values become symbolic SymInts (via _runtime_compute_coordinate_on_dim)
when compile_on_one_rank is active. Previously, mesh.get_coordinate()
returned concrete rank-specific integers that got baked into the compiled
graph, producing different graphs on different ranks.

Also refactors test_compile_on_one_rank.py to extract graph-comparison
helpers (_assert_graphs_identical_across_ranks, _compile_and_capture_graph)
and adds a test for DTensor random op graph consistency.

Authored with Claude.

[ghstack-poisoned]
aorenste added a commit that referenced this pull request Mar 17, 2026
Use mesh._sym_get_coordinate() in _compute_rng_offsets so that RNG
offset values become symbolic SymInts (via _runtime_compute_coordinate_on_dim)
when compile_on_one_rank is active. Previously, mesh.get_coordinate()
returned concrete rank-specific integers that got baked into the compiled
graph, producing different graphs on different ranks.

Also refactors test_compile_on_one_rank.py to extract graph-comparison
helpers (_assert_graphs_identical_across_ranks, _compile_and_capture_graph)
and adds a test for DTensor random op graph consistency.

Authored with Claude.

ghstack-source-id: 328f827
Pull Request resolved: #177447
Use mesh._sym_get_coordinate() in _compute_rng_offsets so that RNG
offset values become symbolic SymInts (via _runtime_compute_coordinate_on_dim)
when compile_on_one_rank is active. Previously, mesh.get_coordinate()
returned concrete rank-specific integers that got baked into the compiled
graph, producing different graphs on different ranks.

Also refactors test_compile_on_one_rank.py to extract graph-comparison
helpers (_assert_graphs_identical_across_ranks, _compile_and_capture_graph)
and adds a test for DTensor random op graph consistency.

Authored with Claude.

[ghstack-poisoned]
aorenste added a commit that referenced this pull request Mar 18, 2026
Use mesh._sym_get_coordinate() in _compute_rng_offsets so that RNG
offset values become symbolic SymInts (via _runtime_compute_coordinate_on_dim)
when compile_on_one_rank is active. Previously, mesh.get_coordinate()
returned concrete rank-specific integers that got baked into the compiled
graph, producing different graphs on different ranks.

Also refactors test_compile_on_one_rank.py to extract graph-comparison
helpers (_assert_graphs_identical_across_ranks, _compile_and_capture_graph)
and adds a test for DTensor random op graph consistency.

Authored with Claude.

ghstack-source-id: 991e325
Pull Request resolved: #177447
Use mesh._sym_get_coordinate() in _compute_rng_offsets so that RNG
offset values become symbolic SymInts (via _runtime_compute_coordinate_on_dim)
when compile_on_one_rank is active. Previously, mesh.get_coordinate()
returned concrete rank-specific integers that got baked into the compiled
graph, producing different graphs on different ranks.

Also refactors test_compile_on_one_rank.py to extract graph-comparison
helpers (_assert_graphs_identical_across_ranks, _compile_and_capture_graph)
and adds a test for DTensor random op graph consistency.

Authored with Claude.

[ghstack-poisoned]
aorenste added a commit that referenced this pull request Mar 18, 2026
Use mesh._sym_get_coordinate() in _compute_rng_offsets so that RNG
offset values become symbolic SymInts (via _runtime_compute_coordinate_on_dim)
when compile_on_one_rank is active. Previously, mesh.get_coordinate()
returned concrete rank-specific integers that got baked into the compiled
graph, producing different graphs on different ranks.

Also refactors test_compile_on_one_rank.py to extract graph-comparison
helpers (_assert_graphs_identical_across_ranks, _compile_and_capture_graph)
and adds a test for DTensor random op graph consistency.

Authored with Claude.

ghstack-source-id: 52b8581
Pull Request resolved: #177447
Use mesh._sym_get_coordinate() in _compute_rng_offsets so that RNG
offset values become symbolic SymInts (via _runtime_compute_coordinate_on_dim)
when compile_on_one_rank is active. Previously, mesh.get_coordinate()
returned concrete rank-specific integers that got baked into the compiled
graph, producing different graphs on different ranks.

Also refactors test_compile_on_one_rank.py to extract graph-comparison
helpers (_assert_graphs_identical_across_ranks, _compile_and_capture_graph)
and adds a test for DTensor random op graph consistency.

Authored with Claude.

[ghstack-poisoned]
aorenste added a commit that referenced this pull request Mar 18, 2026
Use mesh._sym_get_coordinate() in _compute_rng_offsets so that RNG
offset values become symbolic SymInts (via _runtime_compute_coordinate_on_dim)
when compile_on_one_rank is active. Previously, mesh.get_coordinate()
returned concrete rank-specific integers that got baked into the compiled
graph, producing different graphs on different ranks.

Also refactors test_compile_on_one_rank.py to extract graph-comparison
helpers (_assert_graphs_identical_across_ranks, _compile_and_capture_graph)
and adds a test for DTensor random op graph consistency.

Authored with Claude.

ghstack-source-id: a6c27a8
Pull Request resolved: #177447
@aorenste aorenste marked this pull request as ready for review March 19, 2026 03:52
@aorenste aorenste requested a review from yiming0416 March 19, 2026 03:54
Use mesh._sym_get_coordinate() in _compute_rng_offsets so that RNG
offset values become symbolic SymInts (via _runtime_compute_coordinate_on_dim)
when compile_on_one_rank is active. Previously, mesh.get_coordinate()
returned concrete rank-specific integers that got baked into the compiled
graph, producing different graphs on different ranks.

Also refactors test_compile_on_one_rank.py to extract graph-comparison
helpers (_assert_graphs_identical_across_ranks, _compile_and_capture_graph)
and adds a test for DTensor random op graph consistency.

Authored with Claude.

[ghstack-poisoned]
aorenste added a commit that referenced this pull request Mar 19, 2026
Use mesh._sym_get_coordinate() in _compute_rng_offsets so that RNG
offset values become symbolic SymInts (via _runtime_compute_coordinate_on_dim)
when compile_on_one_rank is active. Previously, mesh.get_coordinate()
returned concrete rank-specific integers that got baked into the compiled
graph, producing different graphs on different ranks.

Also refactors test_compile_on_one_rank.py to extract graph-comparison
helpers (_assert_graphs_identical_across_ranks, _compile_and_capture_graph)
and adds a test for DTensor random op graph consistency.

Authored with Claude.

ghstack-source-id: 8d87e47
Pull Request resolved: #177447
Use mesh._sym_get_coordinate() in _compute_rng_offsets so that RNG
offset values become symbolic SymInts (via _runtime_compute_coordinate_on_dim)
when compile_on_one_rank is active. Previously, mesh.get_coordinate()
returned concrete rank-specific integers that got baked into the compiled
graph, producing different graphs on different ranks.

Also refactors test_compile_on_one_rank.py to extract graph-comparison
helpers (_assert_graphs_identical_across_ranks, _compile_and_capture_graph)
and adds a test for DTensor random op graph consistency.

Authored with Claude.

[ghstack-poisoned]
aorenste added a commit that referenced this pull request Mar 19, 2026
Use mesh._sym_get_coordinate() in _compute_rng_offsets so that RNG
offset values become symbolic SymInts (via _runtime_compute_coordinate_on_dim)
when compile_on_one_rank is active. Previously, mesh.get_coordinate()
returned concrete rank-specific integers that got baked into the compiled
graph, producing different graphs on different ranks.

Also refactors test_compile_on_one_rank.py to extract graph-comparison
helpers (_assert_graphs_identical_across_ranks, _compile_and_capture_graph)
and adds a test for DTensor random op graph consistency.

Authored with Claude.

ghstack-source-id: af73b79
Pull Request resolved: #177447
@aorenste
Copy link
Copy Markdown
Contributor Author

test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTest::test_pipeline_parallel_manual_seed is a pre-existing failure in trunk
@pytorchbot merge -i

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 20, 2026
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged while ignoring the following 3 checks: inductor / inductor-cpu-test / test (cpu_inductor_torchbench, 1, 2, linux.2xlarge.amx, unstable), inductor / inductor-test / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu), dtensor / dtensor-test / test (dtensor, 1, 1, lf.linux.g5.12xlarge.nvidia.gpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

AaronWang04 pushed a commit to AaronWang04/pytorch that referenced this pull request Mar 31, 2026
…177447)

Use mesh._sym_get_coordinate() in _compute_rng_offsets so that RNG
offset values become symbolic SymInts (via _runtime_compute_coordinate_on_dim)
when compile_on_one_rank is active. Previously, mesh.get_coordinate()
returned concrete rank-specific integers that got baked into the compiled
graph, producing different graphs on different ranks.

Also refactors test_compile_on_one_rank.py to extract graph-comparison
helpers (_assert_graphs_identical_across_ranks, _compile_and_capture_graph)
and adds a test for DTensor random op graph consistency.

Authored with Claude.
Pull Request resolved: pytorch#177447
Approved by: https://github.com/yiming0416
ghstack dependencies: pytorch#177446
nklshy-aws pushed a commit to nklshy-aws/pytorch that referenced this pull request Apr 7, 2026
…177447)

Use mesh._sym_get_coordinate() in _compute_rng_offsets so that RNG
offset values become symbolic SymInts (via _runtime_compute_coordinate_on_dim)
when compile_on_one_rank is active. Previously, mesh.get_coordinate()
returned concrete rank-specific integers that got baked into the compiled
graph, producing different graphs on different ranks.

Also refactors test_compile_on_one_rank.py to extract graph-comparison
helpers (_assert_graphs_identical_across_ranks, _compile_and_capture_graph)
and adds a test for DTensor random op graph consistency.

Authored with Claude.
Pull Request resolved: pytorch#177447
Approved by: https://github.com/yiming0416
ghstack dependencies: pytorch#177446
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/dtensor Run DTensor specific tests ciflow/inductor ciflow/torchtitan Run TorchTitan integration tests ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (dtensor) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants