[DTensor] Strategy Validation fix for non-tensor args by wconstab · Pull Request #175821 · pytorch/pytorch

wconstab · 2026-02-26T00:44:03Z

Stack from ghstack (oldest at bottom):

Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions
accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None.

Changes made:

get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only.
query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor ->
TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta.
_query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave
tensor-derived objects with non-tensor args from captured_args at their original positions.
compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs.
Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0.
Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature.

Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None. Changes made: 1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only. 2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor -> TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta. 3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave tensor-derived objects with non-tensor args from captured_args at their original positions. 4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs. 5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0. 6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature. [ghstack-poisoned]

pytorch-bot · 2026-02-26T00:44:07Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/175821

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 3a5ac8c with merge base 7eeab8a ():

NEW FAILURE - The following job has failed:

trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable) (gh)
test/test_indexing.py::TestIndexingMPS::test_index_reduce_reduce_mean_mps_float32

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None. Changes made: 1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only. 2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor -> TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta. 3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave tensor-derived objects with non-tensor args from captured_args at their original positions. 4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs. 5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0. 6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature. ghstack-source-id: 527b667 Pull Request resolved: #175821

Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None. Changes made: 1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only. 2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor -> TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta. 3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave tensor-derived objects with non-tensor args from captured_args at their original positions. 4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs. 5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0. 6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature. [ghstack-poisoned]

Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None. Changes made: 1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only. 2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor -> TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta. 3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave tensor-derived objects with non-tensor args from captured_args at their original positions. 4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs. 5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0. 6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature. ghstack-source-id: 24a1347 Pull Request resolved: #175821

Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None. Changes made: 1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only. 2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor -> TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta. 3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave tensor-derived objects with non-tensor args from captured_args at their original positions. 4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs. 5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0. 6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature. [ghstack-poisoned]

Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None. Changes made: 1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only. 2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor -> TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta. 3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave tensor-derived objects with non-tensor args from captured_args at their original positions. 4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs. 5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0. 6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature. ghstack-source-id: f165cb6 Pull Request resolved: #175821

wconstab · 2026-02-27T21:30:55Z

@pytorchbot merge

pytorchmergebot · 2026-02-27T21:33:40Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None. Changes made: 1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only. 2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor -> TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta. 3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave tensor-derived objects with non-tensor args from captured_args at their original positions. 4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs. 5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0. 6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature. [ghstack-poisoned]

pytorchmergebot · 2026-02-28T00:20:27Z

Merge failed

Reason: New commits were pushed while merging. Please rerun the merge command.

Details for Dev Infra team

Raised by workflow job

wconstab · 2026-02-28T00:35:26Z

@pytorchbot merge

pytorchmergebot · 2026-02-28T00:37:41Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2026-02-28T02:33:59Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable)

Details for Dev Infra team

Raised by workflow job

wconstab · 2026-02-28T14:50:38Z

@pytorchbot merge -i

pytorchmergebot · 2026-02-28T14:52:49Z

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

python -m torch.distributed.tensor._ops.strategy_validation --report ``` ====================================================================== DTensor operator registration report ====================================================================== Directly registered: rule (register_prop_rule): 4 op_strategy (register_op_strategy): 612 single_dim_strategy: 13 total: 625 Decomposition table (not directly registered): 669 These ops have entries in torch._decomp.decomposition_table but no direct DTensor strategy. They may work at runtime via DecompShardingStrategy if all decomposed sub-ops are supported. Additional ops beyond this count may also be reachable via CIA (CompositeImplicitAutograd) decompositions. ``` with --report-full ``` rule (4): aten.convolution.default aten.convolution_backward.default aten.index.Tensor aten.index_select.default op_strategy (612): aten.__ilshift__.Scalar aten.__ilshift__.Tensor aten.__irshift__.Scalar aten.__irshift__.Tensor ...(truncated for git commit msg) single_dim_strategy (13): aten._fft_c2c.default aten._fft_c2r.default aten._fft_r2c.default aten._index_put_impl_.default ...(truncated for git commit msg) decomp table (not directly registered) (669): aten.__iand__.Scalar aten.__iand__.Tensor aten.__ior__.Scalar aten.__ior__.Tensor ...(truncated for git commit msg) ``` Pull Request resolved: #176034 Approved by: https://github.com/pianpwk ghstack dependencies: #175821

python -m torch.distributed.tensor._ops.strategy_validation --report ``` ====================================================================== DTensor operator registration report ====================================================================== Directly registered: rule (register_prop_rule): 4 op_strategy (register_op_strategy): 612 single_dim_strategy: 13 total: 625 Decomposition table (not directly registered): 669 These ops have entries in torch._decomp.decomposition_table but no direct DTensor strategy. They may work at runtime via DecompShardingStrategy if all decomposed sub-ops are supported. Additional ops beyond this count may also be reachable via CIA (CompositeImplicitAutograd) decompositions. ``` with --report-full ``` rule (4): aten.convolution.default aten.convolution_backward.default aten.index.Tensor aten.index_select.default op_strategy (612): aten.__ilshift__.Scalar aten.__ilshift__.Tensor aten.__irshift__.Scalar aten.__irshift__.Tensor ...(truncated for git commit msg) single_dim_strategy (13): aten._fft_c2c.default aten._fft_c2r.default aten._fft_r2c.default aten._index_put_impl_.default ...(truncated for git commit msg) decomp table (not directly registered) (669): aten.__iand__.Scalar aten.__iand__.Tensor aten.__ior__.Scalar aten.__ior__.Tensor ...(truncated for git commit msg) ``` Pull Request resolved: pytorch#176034 Approved by: https://github.com/pianpwk ghstack dependencies: pytorch#175821

Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None. Changes made: 1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only. 2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor -> TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta. 3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave tensor-derived objects with non-tensor args from captured_args at their original positions. 4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs. 5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0. 6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature. ghstack-source-id: 7281f86 Pull Request resolved: pytorch/pytorch#175821

Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None. Changes made: 1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only. 2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor -> TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta. 3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave tensor-derived objects with non-tensor args from captured_args at their original positions. 4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs. 5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0. 6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature. Pull Request resolved: pytorch#175821 Approved by: https://github.com/pianpwk

python -m torch.distributed.tensor._ops.strategy_validation --report ``` ====================================================================== DTensor operator registration report ====================================================================== Directly registered: rule (register_prop_rule): 4 op_strategy (register_op_strategy): 612 single_dim_strategy: 13 total: 625 Decomposition table (not directly registered): 669 These ops have entries in torch._decomp.decomposition_table but no direct DTensor strategy. They may work at runtime via DecompShardingStrategy if all decomposed sub-ops are supported. Additional ops beyond this count may also be reachable via CIA (CompositeImplicitAutograd) decompositions. ``` with --report-full ``` rule (4): aten.convolution.default aten.convolution_backward.default aten.index.Tensor aten.index_select.default op_strategy (612): aten.__ilshift__.Scalar aten.__ilshift__.Tensor aten.__irshift__.Scalar aten.__irshift__.Tensor ...(truncated for git commit msg) single_dim_strategy (13): aten._fft_c2c.default aten._fft_c2r.default aten._fft_r2c.default aten._index_put_impl_.default ...(truncated for git commit msg) decomp table (not directly registered) (669): aten.__iand__.Scalar aten.__iand__.Tensor aten.__ior__.Scalar aten.__ior__.Tensor ...(truncated for git commit msg) ``` Pull Request resolved: pytorch#176034 Approved by: https://github.com/pianpwk ghstack dependencies: pytorch#175821

This was referenced Feb 26, 2026

[CI] Add torchtitan tests to PyTorch CI #175816

Closed

[DTensor] End to end test for strategy validator #175588

Closed

[DTensor] strategy_validation report if no dtensor support exists #175589

Closed

pytorch-bot Bot added ciflow/inductor release notes: distributed (dtensor) release notes category labels Feb 26, 2026

wconstab requested review from pianpwk and zpcore February 27, 2026 20:41

wconstab mentioned this pull request Feb 27, 2026

[DTensor] constant_pad_nd non-replicate strategy #175656

Closed

pianpwk approved these changes Feb 27, 2026

View reviewed changes

pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 27, 2026

pytorchmergebot added the merging label Feb 27, 2026

wconstab mentioned this pull request Feb 28, 2026

[DTensor] report registered ops #176034

Closed

pytorchmergebot removed the merging label Feb 28, 2026

pytorchmergebot added the merging label Feb 28, 2026

pytorchmergebot removed the merging label Feb 28, 2026

pytorchmergebot added the merging label Feb 28, 2026

pytorchmergebot closed this in 393bb92 Feb 28, 2026

pytorchmergebot added Merged and removed merging labels Feb 28, 2026

github-actions Bot deleted the gh/wconstab/549/head branch March 31, 2026 02:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DTensor] Strategy Validation fix for non-tensor args#175821

[DTensor] Strategy Validation fix for non-tensor args#175821
wconstab wants to merge 4 commits intogh/wconstab/549/basefrom
gh/wconstab/549/head

wconstab commented Feb 26, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

wconstab commented Feb 27, 2026

Uh oh!

pytorchmergebot commented Feb 27, 2026

Uh oh!

pytorchmergebot commented Feb 28, 2026

Uh oh!

wconstab commented Feb 28, 2026

Uh oh!

pytorchmergebot commented Feb 28, 2026

Uh oh!

pytorchmergebot commented Feb 28, 2026

Uh oh!

wconstab commented Feb 28, 2026

Uh oh!

pytorchmergebot commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wconstab commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/175821

❌ 1 New Failure

Uh oh!

wconstab commented Feb 27, 2026

Uh oh!

pytorchmergebot commented Feb 27, 2026

Merge started

Uh oh!

pytorchmergebot commented Feb 28, 2026

Merge failed

Uh oh!

wconstab commented Feb 28, 2026

Uh oh!

pytorchmergebot commented Feb 28, 2026

Merge started

Uh oh!

pytorchmergebot commented Feb 28, 2026

Merge failed

Uh oh!

wconstab commented Feb 28, 2026

Uh oh!

pytorchmergebot commented Feb 28, 2026

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wconstab commented Feb 26, 2026 •

edited

Loading

pytorch-bot Bot commented Feb 26, 2026 •

edited

Loading