[DTensor] Strategy Validation fix for non-tensor args#175821
[DTensor] Strategy Validation fix for non-tensor args#175821wconstab wants to merge 4 commits intogh/wconstab/549/basefrom
Conversation
Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None. Changes made: 1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only. 2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor -> TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta. 3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave tensor-derived objects with non-tensor args from captured_args at their original positions. 4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs. 5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0. 6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature. [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/175821
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 3a5ac8c with merge base 7eeab8a ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None. Changes made: 1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only. 2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor -> TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta. 3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave tensor-derived objects with non-tensor args from captured_args at their original positions. 4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs. 5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0. 6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature. ghstack-source-id: 527b667 Pull Request resolved: #175821
Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None. Changes made: 1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only. 2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor -> TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta. 3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave tensor-derived objects with non-tensor args from captured_args at their original positions. 4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs. 5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0. 6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature. [ghstack-poisoned]
Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None. Changes made: 1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only. 2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor -> TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta. 3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave tensor-derived objects with non-tensor args from captured_args at their original positions. 4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs. 5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0. 6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature. ghstack-source-id: 24a1347 Pull Request resolved: #175821
Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None. Changes made: 1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only. 2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor -> TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta. 3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave tensor-derived objects with non-tensor args from captured_args at their original positions. 4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs. 5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0. 6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature. [ghstack-poisoned]
Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None. Changes made: 1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only. 2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor -> TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta. 3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave tensor-derived objects with non-tensor args from captured_args at their original positions. 4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs. 5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0. 6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature. ghstack-source-id: f165cb6 Pull Request resolved: #175821
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None. Changes made: 1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only. 2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor -> TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta. 3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave tensor-derived objects with non-tensor args from captured_args at their original positions. 4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs. 5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0. 6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature. [ghstack-poisoned]
Merge failedReason: New commits were pushed while merging. Please rerun the merge command. Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable) Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 1 checks: trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
python -m torch.distributed.tensor._ops.strategy_validation --report ``` ====================================================================== DTensor operator registration report ====================================================================== Directly registered: rule (register_prop_rule): 4 op_strategy (register_op_strategy): 612 single_dim_strategy: 13 total: 625 Decomposition table (not directly registered): 669 These ops have entries in torch._decomp.decomposition_table but no direct DTensor strategy. They may work at runtime via DecompShardingStrategy if all decomposed sub-ops are supported. Additional ops beyond this count may also be reachable via CIA (CompositeImplicitAutograd) decompositions. ``` with --report-full ``` rule (4): aten.convolution.default aten.convolution_backward.default aten.index.Tensor aten.index_select.default op_strategy (612): aten.__ilshift__.Scalar aten.__ilshift__.Tensor aten.__irshift__.Scalar aten.__irshift__.Tensor ...(truncated for git commit msg) single_dim_strategy (13): aten._fft_c2c.default aten._fft_c2r.default aten._fft_r2c.default aten._index_put_impl_.default ...(truncated for git commit msg) decomp table (not directly registered) (669): aten.__iand__.Scalar aten.__iand__.Tensor aten.__ior__.Scalar aten.__ior__.Tensor ...(truncated for git commit msg) ``` Pull Request resolved: #176034 Approved by: https://github.com/pianpwk ghstack dependencies: #175821
python -m torch.distributed.tensor._ops.strategy_validation --report ``` ====================================================================== DTensor operator registration report ====================================================================== Directly registered: rule (register_prop_rule): 4 op_strategy (register_op_strategy): 612 single_dim_strategy: 13 total: 625 Decomposition table (not directly registered): 669 These ops have entries in torch._decomp.decomposition_table but no direct DTensor strategy. They may work at runtime via DecompShardingStrategy if all decomposed sub-ops are supported. Additional ops beyond this count may also be reachable via CIA (CompositeImplicitAutograd) decompositions. ``` with --report-full ``` rule (4): aten.convolution.default aten.convolution_backward.default aten.index.Tensor aten.index_select.default op_strategy (612): aten.__ilshift__.Scalar aten.__ilshift__.Tensor aten.__irshift__.Scalar aten.__irshift__.Tensor ...(truncated for git commit msg) single_dim_strategy (13): aten._fft_c2c.default aten._fft_c2r.default aten._fft_r2c.default aten._index_put_impl_.default ...(truncated for git commit msg) decomp table (not directly registered) (669): aten.__iand__.Scalar aten.__iand__.Tensor aten.__ior__.Scalar aten.__ior__.Tensor ...(truncated for git commit msg) ``` Pull Request resolved: pytorch#176034 Approved by: https://github.com/pianpwk ghstack dependencies: pytorch#175821
Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None. Changes made: 1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only. 2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor -> TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta. 3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave tensor-derived objects with non-tensor args from captured_args at their original positions. 4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs. 5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0. 6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature. ghstack-source-id: 7281f86 Pull Request resolved: pytorch/pytorch#175821
Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None. Changes made: 1. get_aten_op_for_sample (line 723): Now returns captured_args and captured_kwargs as-is (with tensors in place) instead of filtering to non-tensor-only. 2. query_single_dim_strategy (line 726): New signature takes captured_args/captured_kwargs instead of tensors/mesh/kwargs. Builds args_meta by mapping torch.Tensor -> TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta. 3. _query_dtensor_rules (line 934): New signature takes captured_args/captured_kwargs. All three strategy paths (single-dim, op_strategy, decomp) now interleave tensor-derived objects with non-tensor args from captured_args at their original positions. 4. compare_operator caller (line 1401): Updated to pass captured_args/captured_kwargs. 5. Test (test_compare_operator_runtime_schema_ops): Data-driven test over ["flip", "roll"] asserting true_positives > 0. 6. Existing test (test_kwargs_forwarded_to_strategy): Updated to use new query_single_dim_strategy signature. Pull Request resolved: pytorch#175821 Approved by: https://github.com/pianpwk
python -m torch.distributed.tensor._ops.strategy_validation --report ``` ====================================================================== DTensor operator registration report ====================================================================== Directly registered: rule (register_prop_rule): 4 op_strategy (register_op_strategy): 612 single_dim_strategy: 13 total: 625 Decomposition table (not directly registered): 669 These ops have entries in torch._decomp.decomposition_table but no direct DTensor strategy. They may work at runtime via DecompShardingStrategy if all decomposed sub-ops are supported. Additional ops beyond this count may also be reachable via CIA (CompositeImplicitAutograd) decompositions. ``` with --report-full ``` rule (4): aten.convolution.default aten.convolution_backward.default aten.index.Tensor aten.index_select.default op_strategy (612): aten.__ilshift__.Scalar aten.__ilshift__.Tensor aten.__irshift__.Scalar aten.__irshift__.Tensor ...(truncated for git commit msg) single_dim_strategy (13): aten._fft_c2c.default aten._fft_c2r.default aten._fft_r2c.default aten._index_put_impl_.default ...(truncated for git commit msg) decomp table (not directly registered) (669): aten.__iand__.Scalar aten.__iand__.Tensor aten.__ior__.Scalar aten.__ior__.Tensor ...(truncated for git commit msg) ``` Pull Request resolved: pytorch#176034 Approved by: https://github.com/pianpwk ghstack dependencies: pytorch#175821
Stack from ghstack (oldest at bottom):
Problem: query_single_dim_strategy built args_meta from only tensor arguments, dropping non-tensor positional args (like dims for flip/roll). When strategy functions
accessed these args by position, they got IndexError, which was silently swallowed by except Exception: return None.
Changes made:
TensorMeta while keeping non-tensor args at their original positions — matching OpSchema.args_meta.
tensor-derived objects with non-tensor args from captured_args at their original positions.