[PyTorch] Debug-gate static_assert in KernelFunction::makeFromUnboxedFunctor#51367
Closed
swolchok wants to merge 7 commits intogh/swolchok/104/basefrom
Closed
[PyTorch] Debug-gate static_assert in KernelFunction::makeFromUnboxedFunctor#51367swolchok wants to merge 7 commits intogh/swolchok/104/basefrom
swolchok wants to merge 7 commits intogh/swolchok/104/basefrom
Conversation
…Functor Templight said that this assertion was taking about 5% of build time for RegisterCPU.cpp (a hopefully-representative example I picked to shorten my iteration cycle). I've debug-gated it on the grounds that 1) we at least try to build everything in debug mode and 2) optimized builds presumably take longer in general, so we can more afford to pay the build time cost in debug builds. The win is not entirely clear; please see the test plan for details. Differential Revision: [D26153793](https://our.internmc.facebook.com/intern/diff/D26153793/) [ghstack-poisoned]
Contributor
💊 CI failures summary and remediationsAs of commit 7635049 (more details on the Dr. CI page):
Extra GitHub checks: 1 failed
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
This was referenced Jan 29, 2021
…FromUnboxedFunctor" Templight said that this assertion was taking about 5% of build time for RegisterCPU.cpp (a hopefully-representative example I picked to shorten my iteration cycle). I've debug-gated it on the grounds that 1) we at least try to build everything in debug mode and 2) optimized builds presumably take longer in general, so we can more afford to pay the build time cost in debug builds. The win is not entirely clear; please see the test plan for details. Differential Revision: [D26153793](https://our.internmc.facebook.com/intern/diff/D26153793/) [ghstack-poisoned]
ezyang
approved these changes
Feb 1, 2021
…FromUnboxedFunctor" Templight said that this assertion was taking about 5% of build time for RegisterCPU.cpp (a hopefully-representative example I picked to shorten my iteration cycle). I've debug-gated it on the grounds that 1) we at least try to build everything in debug mode and 2) optimized builds presumably take longer in general, so we can more afford to pay the build time cost in debug builds. The win is not entirely clear; please see the test plan for details. Differential Revision: [D26153793](https://our.internmc.facebook.com/intern/diff/D26153793/) [ghstack-poisoned]
…FromUnboxedFunctor" Templight said that this assertion was taking about 5% of build time for RegisterCPU.cpp (a hopefully-representative example I picked to shorten my iteration cycle). I've debug-gated it on the grounds that 1) we at least try to build everything in debug mode and 2) optimized builds presumably take longer in general, so we can more afford to pay the build time cost in debug builds. The win is not entirely clear; please see the test plan for details. Differential Revision: [D26153793](https://our.internmc.facebook.com/intern/diff/D26153793/) [ghstack-poisoned]
…FromUnboxedFunctor" Templight said that this assertion was taking about 5% of build time for RegisterCPU.cpp (a hopefully-representative example I picked to shorten my iteration cycle). I've debug-gated it on the grounds that 1) we at least try to build everything in debug mode and 2) optimized builds presumably take longer in general, so we can more afford to pay the build time cost in debug builds. The win is not entirely clear; please see the test plan for details. Differential Revision: [D26153793](https://our.internmc.facebook.com/intern/diff/D26153793/) [ghstack-poisoned]
…FromUnboxedFunctor" Templight said that this assertion was taking about 5% of build time for RegisterCPU.cpp (a hopefully-representative example I picked to shorten my iteration cycle). I've debug-gated it on the grounds that 1) we at least try to build everything in debug mode and 2) optimized builds presumably take longer in general, so we can more afford to pay the build time cost in debug builds. The win is not entirely clear; please see the test plan for details. Differential Revision: [D26153793](https://our.internmc.facebook.com/intern/diff/D26153793/) [ghstack-poisoned]
… in KernelFunction::makeFromUnboxedFunctor" Templight said that this assertion was taking about 5% of build time for RegisterCPU.cpp (a hopefully-representative example I picked to shorten my iteration cycle). I've debug-gated it on the grounds that 1) we at least try to build everything in debug mode and 2) optimized builds presumably take longer in general, so we can more afford to pay the build time cost in debug builds. The win is not entirely clear; please see the test plan for details. Differential Revision: [D26153793](https://our.internmc.facebook.com/intern/diff/D26153793/) [ghstack-poisoned]
Contributor
|
This pull request has been merged in c442776. |
xsacha
pushed a commit
to xsacha/pytorch
that referenced
this pull request
Mar 31, 2021
…Functor (pytorch#51367) Summary: Pull Request resolved: pytorch#51367 Templight said that this assertion was taking about 5% of build time for RegisterCPU.cpp (a hopefully-representative example I picked to shorten my iteration cycle). I've debug-gated it on the grounds that 1) we at least try to build everything in debug mode and 2) optimized builds presumably take longer in general, so we can more afford to pay the build time cost in debug builds. The win is not entirely clear; please see the test plan for details. ghstack-source-id: 121378960 Test Plan: 1) Built RegisterCPU.cpp with -ftime-trace before and after. It doesn't seem to call out any difference in the details, but the overall time is stably down more like 10% (55s before and 49s after). 2) Did a full rebuild of aten-cpu with -ftime-trace before and after. No significant difference in build times shown (it says *after* is a regression, but it's using wall-time data and the machine is loaded during builds so there's some noise). 3) Re-profiled with Templight. Before: {F366557311} After: {F366557501} Not sure what to conclude overall. A known problem with templight is that template instantiations form more of a dependency graph than a tree because they're cached internally, so eliminating the first caller of a template may just move the time to another caller. However, it looks like we have actually reduced is_functor traffic. UPDATE: I don't think that the -ftime-trace measurement was reliable; it seems to skew running times. I built this diff vs its base 5 times and measured the CPU ("user") time each time. Results (in seconds): previous diff: [51.97, 50.54, 50.49, 52.89, 51.61] mean: 51.5 std: 0.906 this diff: [50.53, 50.41, 50.57, 50.67, 50.94] mean: 50.6 std: 0.179 Reviewed By: ezyang Differential Revision: D26153793 fbshipit-source-id: 9a66912c1b2b068f453e78be57454e4e62b7107b
laurentdupin
pushed a commit
to laurentdupin/pytorch
that referenced
this pull request
Apr 24, 2026
…Functor (pytorch#51367) Summary: Pull Request resolved: pytorch#51367 Templight said that this assertion was taking about 5% of build time for RegisterCPU.cpp (a hopefully-representative example I picked to shorten my iteration cycle). I've debug-gated it on the grounds that 1) we at least try to build everything in debug mode and 2) optimized builds presumably take longer in general, so we can more afford to pay the build time cost in debug builds. The win is not entirely clear; please see the test plan for details. ghstack-source-id: 121378960 Test Plan: 1) Built RegisterCPU.cpp with -ftime-trace before and after. It doesn't seem to call out any difference in the details, but the overall time is stably down more like 10% (55s before and 49s after). 2) Did a full rebuild of aten-cpu with -ftime-trace before and after. No significant difference in build times shown (it says *after* is a regression, but it's using wall-time data and the machine is loaded during builds so there's some noise). 3) Re-profiled with Templight. Before: {F366557311} After: {F366557501} Not sure what to conclude overall. A known problem with templight is that template instantiations form more of a dependency graph than a tree because they're cached internally, so eliminating the first caller of a template may just move the time to another caller. However, it looks like we have actually reduced is_functor traffic. UPDATE: I don't think that the -ftime-trace measurement was reliable; it seems to skew running times. I built this diff vs its base 5 times and measured the CPU ("user") time each time. Results (in seconds): previous diff: [51.97, 50.54, 50.49, 52.89, 51.61] mean: 51.5 std: 0.906 this diff: [50.53, 50.41, 50.57, 50.67, 50.94] mean: 50.6 std: 0.179 Reviewed By: ezyang Differential Revision: D26153793 fbshipit-source-id: 9a66912c1b2b068f453e78be57454e4e62b7107b
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack:
Templight said that this assertion was taking about 5% of build time for RegisterCPU.cpp (a hopefully-representative example I picked to shorten my iteration cycle).
I've debug-gated it on the grounds that 1) we at least try to build
everything in debug mode and 2) optimized builds presumably take
longer in general, so we can more afford to pay the build time cost in
debug builds.
The win is not entirely clear; please see the test plan for details.
Differential Revision: D26153793