[shard prop] single-dim rules for conv, uniform, scatter, index ops #179185
[shard prop] single-dim rules for conv, uniform, scatter, index ops #179185pianpwk wants to merge 6 commits intogh/pianpwk/129/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/179185
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ⏳ 1 Pending, 11 Unrelated FailuresAs of commit ae11809 with merge base 278936a ( FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…index ops " [ghstack-poisoned]
|
@claude review this pr. consider edge cases that could break these strategies |
|
Claude finished @anshul-si's task in 7m 58s —— View job Review of PR #179185
Overall this PR adds correct and well-structured single-dim sharding strategies. The strategies are conservative (batch-dim-only for conv, skip-active-dims for scatter ops), which is the right approach. I found one substantive issue and a few minor observations. Issue: Conv strategy doesn't cover channel-dim sharding
This is correct for data parallelism, but the existing For the single-dim strategy, sharding on spatial dims (dims ≥ 2) would require halo exchanges, so excluding those is correct. But sharding on the channel dim (dim 1) with Verdict: Not a bug, just a coverage gap. Fine for now. Observation:
|
|
|
||
|
|
||
| @register_single_dim_strategy([aten.uniform.default]) | ||
| def uniform_single_dim_strategy( |
There was a problem hiding this comment.
uniform single dim strategy will be merged in #178457.
…index ops " [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot revert -m "Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable" -c autorevert This PR is attributed to have caused regression in: Note: This revert was accelerated by the AI advisor: AI advisor: revert (confidence=0.95) Please investigate and fix the issues. |
|
Claude finished @pytorch-auto-revert[bot]'s task in 3m 6s —— View job Diagnosing CI Revert for PR #179185
Verdict: The revert is legitimate. The fix is straightforward. Root CauseThe failing test is: The PR added a ops_dde_xfail
| ops_unbacked_dtensor_dde
| dtensor_fails_no_strategy # <-- select_scatter was here, shielding it from unbacked tests
| ops_unbacked_skip,By removing This is a data-dependent error (DDE) under FixAdd ops_unbacked_dtensor_dde = {
...
xfail("select_scatter"), # sharding propagation triggers DDE on unbacked dims
...
}Similarly, check if The |
|
@pytorchbot successfully started a revert job. Check the current status here. |
|
@pianpwk your PR has been successfully reverted. |
…ex ops (#179185)" This reverts commit 6279179. Reverted #179185 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](#179185 (comment)))
…index ops " [ghstack-poisoned]
…index ops " [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):