Skip to content

[DTensor] Strategy Validation fix for non-tensor args#175821

Closed
wconstab wants to merge 4 commits intogh/wconstab/549/basefrom
gh/wconstab/549/head
Closed

[DTensor] Strategy Validation fix for non-tensor args#175821
wconstab wants to merge 4 commits intogh/wconstab/549/basefrom
gh/wconstab/549/head

Conversation

@wconstab
Copy link
Copy Markdown
Contributor

@wconstab wconstab commented Feb 26, 2026

Stack from ghstack (oldest at bottom):

Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions
accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None.

Changes made:

  1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only.
  2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor ->
    TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta.
  3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave
    tensor-derived objects with non-tensor args from captured_args at their original positions.
  4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs.
  5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0.
  6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature.

Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions
accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None.

Changes made:

1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only.
2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor ->
TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta.
3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave
tensor-derived objects with non-tensor args from captured_args at their original positions.
4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs.
5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0.
6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature.

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Feb 26, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/175821

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 3a5ac8c with merge base 7eeab8a (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

wconstab added a commit that referenced this pull request Feb 26, 2026
Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions
accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None.

Changes made:

1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only.
2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor ->
TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta.
3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave
tensor-derived objects with non-tensor args from captured_args at their original positions.
4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs.
5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0.
6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature.

ghstack-source-id: 527b667
Pull Request resolved: #175821
Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions
accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None.

Changes made:

1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only.
2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor ->
TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta.
3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave
tensor-derived objects with non-tensor args from captured_args at their original positions.
4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs.
5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0.
6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature.

[ghstack-poisoned]
wconstab added a commit that referenced this pull request Feb 26, 2026
Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions
accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None.

Changes made:

1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only.
2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor ->
TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta.
3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave
tensor-derived objects with non-tensor args from captured_args at their original positions.
4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs.
5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0.
6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature.

ghstack-source-id: 24a1347
Pull Request resolved: #175821
Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions
accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None.

Changes made:

1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only.
2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor ->
TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta.
3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave
tensor-derived objects with non-tensor args from captured_args at their original positions.
4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs.
5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0.
6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature.

[ghstack-poisoned]
wconstab added a commit that referenced this pull request Feb 27, 2026
Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions
accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None.

Changes made:

1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only.
2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor ->
TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta.
3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave
tensor-derived objects with non-tensor args from captured_args at their original positions.
4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs.
5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0.
6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature.

ghstack-source-id: f165cb6
Pull Request resolved: #175821
@wconstab wconstab requested review from pianpwk and zpcore February 27, 2026 20:41
@wconstab
Copy link
Copy Markdown
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 27, 2026
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions
accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None.

Changes made:

1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only.
2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor ->
TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta.
3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave
tensor-derived objects with non-tensor args from captured_args at their original positions.
4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs.
5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0.
6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature.

[ghstack-poisoned]
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: New commits were pushed while merging. Please rerun the merge command.

Details for Dev Infra team Raised by workflow job

@wconstab
Copy link
Copy Markdown
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable)

Details for Dev Infra team Raised by workflow job

@wconstab
Copy link
Copy Markdown
Contributor Author

@pytorchbot merge -i

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Mar 4, 2026
python -m torch.distributed.tensor._ops.strategy_validation --report

```
======================================================================
DTensor operator registration report
======================================================================

Directly registered:
  rule (register_prop_rule):               4
  op_strategy (register_op_strategy):    612
  single_dim_strategy:                    13
  total:                                 625

Decomposition table (not directly registered): 669
  These ops have entries in torch._decomp.decomposition_table but no
  direct DTensor strategy. They may work at runtime via
  DecompShardingStrategy if all decomposed sub-ops are supported.
  Additional ops beyond this count may also be reachable via CIA
  (CompositeImplicitAutograd) decompositions.
```

with --report-full

```
rule (4):
  aten.convolution.default
  aten.convolution_backward.default
  aten.index.Tensor
  aten.index_select.default

op_strategy (612):
  aten.__ilshift__.Scalar
  aten.__ilshift__.Tensor
  aten.__irshift__.Scalar
  aten.__irshift__.Tensor
  ...(truncated for git commit msg)

single_dim_strategy (13):
  aten._fft_c2c.default
  aten._fft_c2r.default
  aten._fft_r2c.default
  aten._index_put_impl_.default
  ...(truncated for git commit msg)

decomp table (not directly registered) (669):
  aten.__iand__.Scalar
  aten.__iand__.Tensor
  aten.__ior__.Scalar
  aten.__ior__.Tensor
  ...(truncated for git commit msg)

```
Pull Request resolved: #176034
Approved by: https://github.com/pianpwk
ghstack dependencies: #175821
pytorchmergebot pushed a commit to anatoliylitv/pytorch that referenced this pull request Mar 4, 2026
python -m torch.distributed.tensor._ops.strategy_validation --report

```
======================================================================
DTensor operator registration report
======================================================================

Directly registered:
  rule (register_prop_rule):               4
  op_strategy (register_op_strategy):    612
  single_dim_strategy:                    13
  total:                                 625

Decomposition table (not directly registered): 669
  These ops have entries in torch._decomp.decomposition_table but no
  direct DTensor strategy. They may work at runtime via
  DecompShardingStrategy if all decomposed sub-ops are supported.
  Additional ops beyond this count may also be reachable via CIA
  (CompositeImplicitAutograd) decompositions.
```

with --report-full

```
rule (4):
  aten.convolution.default
  aten.convolution_backward.default
  aten.index.Tensor
  aten.index_select.default

op_strategy (612):
  aten.__ilshift__.Scalar
  aten.__ilshift__.Tensor
  aten.__irshift__.Scalar
  aten.__irshift__.Tensor
  ...(truncated for git commit msg)

single_dim_strategy (13):
  aten._fft_c2c.default
  aten._fft_c2r.default
  aten._fft_r2c.default
  aten._index_put_impl_.default
  ...(truncated for git commit msg)

decomp table (not directly registered) (669):
  aten.__iand__.Scalar
  aten.__iand__.Tensor
  aten.__ior__.Scalar
  aten.__ior__.Tensor
  ...(truncated for git commit msg)

```
Pull Request resolved: pytorch#176034
Approved by: https://github.com/pianpwk
ghstack dependencies: pytorch#175821
sandy-gags pushed a commit to sandy-gags/pytorch that referenced this pull request Mar 12, 2026
Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions
accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None.

Changes made:

1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only.
2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor ->
TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta.
3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave
tensor-derived objects with non-tensor args from captured_args at their original positions.
4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs.
5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0.
6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature.

ghstack-source-id: 7281f86
Pull Request resolved: pytorch/pytorch#175821
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions
accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None.

Changes made:

1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only.
2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor ->
TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta.
3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave
tensor-derived objects with non-tensor args from captured_args at their original positions.
4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs.
5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0.
6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature.
Pull Request resolved: pytorch#175821
Approved by: https://github.com/pianpwk
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
python -m torch.distributed.tensor._ops.strategy_validation --report

```
======================================================================
DTensor operator registration report
======================================================================

Directly registered:
  rule (register_prop_rule):               4
  op_strategy (register_op_strategy):    612
  single_dim_strategy:                    13
  total:                                 625

Decomposition table (not directly registered): 669
  These ops have entries in torch._decomp.decomposition_table but no
  direct DTensor strategy. They may work at runtime via
  DecompShardingStrategy if all decomposed sub-ops are supported.
  Additional ops beyond this count may also be reachable via CIA
  (CompositeImplicitAutograd) decompositions.
```

with --report-full

```
rule (4):
  aten.convolution.default
  aten.convolution_backward.default
  aten.index.Tensor
  aten.index_select.default

op_strategy (612):
  aten.__ilshift__.Scalar
  aten.__ilshift__.Tensor
  aten.__irshift__.Scalar
  aten.__irshift__.Tensor
  ...(truncated for git commit msg)

single_dim_strategy (13):
  aten._fft_c2c.default
  aten._fft_c2r.default
  aten._fft_r2c.default
  aten._index_put_impl_.default
  ...(truncated for git commit msg)

decomp table (not directly registered) (669):
  aten.__iand__.Scalar
  aten.__iand__.Tensor
  aten.__ior__.Scalar
  aten.__ior__.Tensor
  ...(truncated for git commit msg)

```
Pull Request resolved: pytorch#176034
Approved by: https://github.com/pianpwk
ghstack dependencies: pytorch#175821
@github-actions github-actions Bot deleted the gh/wconstab/549/head branch March 31, 2026 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (dtensor) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants