add channel last 3d support for maxpool3d on CPU#97775
add channel last 3d support for maxpool3d on CPU#97775CaoE wants to merge 28 commits intogh/CaoE/9/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/97775
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 1 Unrelated FailureAs of commit b62a746 with merge base c68d0a7 ( NEW FAILURES - The following jobs have failed:
UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
### Testing Single socket (28 cores): shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 3.959584 | 5.493402 | 0.557232 | 0.568485 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 0.815511 | 1.351261 | 5.710506 | 10.57506 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 10.63426 | 15.28637 | 2.67656 | 1.71365 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 2.63570 | 2.05532 | 2.55452 | 2.33923 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 0.375469 | 0.479748 | 0.066364 | 0.065155 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 0.112197 | 0.112326 | 0.111697 | 0.145364 Single core: shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 92.16582 | 128.6513 | 6.684325 | 12.21541 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 10.14318 | 29.80297 | 7.350142 | 11.25323 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 238.55453 | 331.89967 | 19.694657 | 32.78853 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 30.17079 | 32.75628 | 22.44543 | 30.17796 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 7.474389 | 9.937217 | 0.236015 | 0.434229 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 2.318954 | 2.469444 | 0.262125 | 0.401361 cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
### Testing Single socket (28 cores): shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 3.959584 | 5.493402 | 0.557232 | 0.568485 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 0.815511 | 1.351261 | 5.710506 | 10.57506 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 10.63426 | 15.28637 | 2.67656 | 1.71365 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 2.63570 | 2.05532 | 2.55452 | 2.33923 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 0.375469 | 0.479748 | 0.066364 | 0.065155 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 0.112197 | 0.112326 | 0.111697 | 0.145364 Single core: shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 92.16582 | 128.6513 | 6.684325 | 12.21541 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 10.14318 | 29.80297 | 7.350142 | 11.25323 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 238.55453 | 331.89967 | 19.694657 | 32.78853 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 30.17079 | 32.75628 | 22.44543 | 30.17796 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 7.474389 | 9.937217 | 0.236015 | 0.434229 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 2.318954 | 2.469444 | 0.262125 | 0.401361 cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
### Testing Single socket (28 cores): shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 3.959584 | 5.493402 | 0.557232 | 0.568485 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 0.815511 | 1.351261 | 5.710506 | 10.57506 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 10.63426 | 15.28637 | 2.67656 | 1.71365 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 2.63570 | 2.05532 | 2.55452 | 2.33923 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 0.375469 | 0.479748 | 0.066364 | 0.065155 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 0.112197 | 0.112326 | 0.111697 | 0.145364 Single core: shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 92.16582 | 128.6513 | 6.684325 | 12.21541 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 10.14318 | 29.80297 | 7.350142 | 11.25323 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 238.55453 | 331.89967 | 19.694657 | 32.78853 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 30.17079 | 32.75628 | 22.44543 | 30.17796 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 7.474389 | 9.937217 | 0.236015 | 0.434229 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 2.318954 | 2.469444 | 0.262125 | 0.401361 cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
|
@mikaylagawarecki Could you please review this PR ? Thank you. |
|
@CaoE Sure, will take a look later today! |
There was a problem hiding this comment.
Hey @CaoE as a broader comment
Following up from our previous discussion, is #106104 ready for review? Happy to review it if it is (and would prefer that we use that testing before landing the rest of these PRs :)
Separately, we must be wary that the test_memory_format tests for ModuleInfos has been skipped rather than xfailed for quite a few ops, as we add channels_last_3d support for more of them, could we unskip these tests as well please e.g. https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_modules.py#L2555
### Testing Single socket (28 cores): shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 3.959584 | 5.493402 | 0.557232 | 0.568485 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 0.815511 | 1.351261 | 5.710506 | 10.57506 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 10.63426 | 15.28637 | 2.67656 | 1.71365 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 2.63570 | 2.05532 | 2.55452 | 2.33923 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 0.375469 | 0.479748 | 0.066364 | 0.065155 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 0.112197 | 0.112326 | 0.111697 | 0.145364 Single core: shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 92.16582 | 128.6513 | 6.684325 | 12.21541 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 10.14318 | 29.80297 | 7.350142 | 11.25323 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 238.55453 | 331.89967 | 19.694657 | 32.78853 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 30.17079 | 32.75628 | 22.44543 | 30.17796 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 7.474389 | 9.937217 | 0.236015 | 0.434229 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 2.318954 | 2.469444 | 0.262125 | 0.401361 cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
|
@mikaylagawarecki Could you please review this PR ? #106104 is ready for review. Thank you. |
### Testing Single socket (28 cores): shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 3.959584 | 5.493402 | 0.557232 | 0.568485 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 0.815511 | 1.351261 | 5.710506 | 10.57506 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 10.63426 | 15.28637 | 2.67656 | 1.71365 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 2.63570 | 2.05532 | 2.55452 | 2.33923 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 0.375469 | 0.479748 | 0.066364 | 0.065155 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 0.112197 | 0.112326 | 0.111697 | 0.145364 Single core: shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 92.16582 | 128.6513 | 6.684325 | 12.21541 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 10.14318 | 29.80297 | 7.350142 | 11.25323 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 238.55453 | 331.89967 | 19.694657 | 32.78853 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 30.17079 | 32.75628 | 22.44543 | 30.17796 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 7.474389 | 9.937217 | 0.236015 | 0.434229 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 2.318954 | 2.469444 | 0.262125 | 0.401361 cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
|
@mikaylagawarecki Can we merge this PR before 2.1 branch cut if possible ? |
### Testing Single socket (28 cores): shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 3.959584 | 5.493402 | 0.557232 | 0.568485 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 0.815511 | 1.351261 | 5.710506 | 10.57506 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 10.63426 | 15.28637 | 2.67656 | 1.71365 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 2.63570 | 2.05532 | 2.55452 | 2.33923 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 0.375469 | 0.479748 | 0.066364 | 0.065155 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 0.112197 | 0.112326 | 0.111697 | 0.145364 Single core: shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 92.16582 | 128.6513 | 6.684325 | 12.21541 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 10.14318 | 29.80297 | 7.350142 | 11.25323 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 238.55453 | 331.89967 | 19.694657 | 32.78853 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 30.17079 | 32.75628 | 22.44543 | 30.17796 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 7.474389 | 9.937217 | 0.236015 | 0.434229 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 2.318954 | 2.469444 | 0.262125 | 0.401361 cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
mikaylagawarecki
left a comment
There was a problem hiding this comment.
Thanks for using the ModuleInfo testing
|
@pytorchbot merge -f "macos failures are unrelated" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
### Testing Single socket (28 cores): shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 3.959584 | 5.493402 | 0.557232 | 0.568485 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 0.815511 | 1.351261 | 5.710506 | 10.57506 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 10.63426 | 15.28637 | 2.67656 | 1.71365 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 2.63570 | 2.05532 | 2.55452 | 2.33923 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 0.375469 | 0.479748 | 0.066364 | 0.065155 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 0.112197 | 0.112326 | 0.111697 | 0.145364 Single core: shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 92.16582 | 128.6513 | 6.684325 | 12.21541 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 10.14318 | 29.80297 | 7.350142 | 11.25323 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 238.55453 | 331.89967 | 19.694657 | 32.78853 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 30.17079 | 32.75628 | 22.44543 | 30.17796 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 7.474389 | 9.937217 | 0.236015 | 0.434229 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 2.318954 | 2.469444 | 0.262125 | 0.401361 Pull Request resolved: #97775 Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki
Stack from ghstack (oldest at bottom):
Testing
Single socket (28 cores):
Single core:
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10