Skip to content

[PyTorch] Debug-gate static_assert in KernelFunction::makeFromUnboxedFunctor#51367

Closed
swolchok wants to merge 7 commits intogh/swolchok/104/basefrom
gh/swolchok/104/head
Closed

[PyTorch] Debug-gate static_assert in KernelFunction::makeFromUnboxedFunctor#51367
swolchok wants to merge 7 commits intogh/swolchok/104/basefrom
gh/swolchok/104/head

Conversation

@swolchok
Copy link
Copy Markdown
Contributor

@swolchok swolchok commented Jan 29, 2021

Stack from ghstack:

Templight said that this assertion was taking about 5% of build time for RegisterCPU.cpp (a hopefully-representative example I picked to shorten my iteration cycle).

I've debug-gated it on the grounds that 1) we at least try to build
everything in debug mode and 2) optimized builds presumably take
longer in general, so we can more afford to pay the build time cost in
debug builds.

The win is not entirely clear; please see the test plan for details.

Differential Revision: D26153793

…Functor

Templight said that this assertion was taking about 5% of build time for RegisterCPU.cpp (a hopefully-representative example I picked to shorten my iteration cycle).

I've debug-gated it on the grounds that 1) we at least try to build
everything in debug mode and 2) optimized builds presumably take
longer in general, so we can more afford to pay the build time cost in
debug builds.

The win is not entirely clear; please see the test plan for details.

Differential Revision: [D26153793](https://our.internmc.facebook.com/intern/diff/D26153793/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Jan 29, 2021

💊 CI failures summary and remediations

As of commit 7635049 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

Extra GitHub checks: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

…FromUnboxedFunctor"

Templight said that this assertion was taking about 5% of build time for RegisterCPU.cpp (a hopefully-representative example I picked to shorten my iteration cycle).

I've debug-gated it on the grounds that 1) we at least try to build
everything in debug mode and 2) optimized builds presumably take
longer in general, so we can more afford to pay the build time cost in
debug builds.

The win is not entirely clear; please see the test plan for details.

Differential Revision: [D26153793](https://our.internmc.facebook.com/intern/diff/D26153793/)

[ghstack-poisoned]
…FromUnboxedFunctor"

Templight said that this assertion was taking about 5% of build time for RegisterCPU.cpp (a hopefully-representative example I picked to shorten my iteration cycle).

I've debug-gated it on the grounds that 1) we at least try to build
everything in debug mode and 2) optimized builds presumably take
longer in general, so we can more afford to pay the build time cost in
debug builds.

The win is not entirely clear; please see the test plan for details.

Differential Revision: [D26153793](https://our.internmc.facebook.com/intern/diff/D26153793/)

[ghstack-poisoned]
Copy link
Copy Markdown

@bhosmer bhosmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks trivially ok - I can't imagine the build errors being related, but pinging @smessmer for thoughts [edit: ah nm it's from a prior diff]

…FromUnboxedFunctor"

Templight said that this assertion was taking about 5% of build time for RegisterCPU.cpp (a hopefully-representative example I picked to shorten my iteration cycle).

I've debug-gated it on the grounds that 1) we at least try to build
everything in debug mode and 2) optimized builds presumably take
longer in general, so we can more afford to pay the build time cost in
debug builds.

The win is not entirely clear; please see the test plan for details.

Differential Revision: [D26153793](https://our.internmc.facebook.com/intern/diff/D26153793/)

[ghstack-poisoned]
…FromUnboxedFunctor"

Templight said that this assertion was taking about 5% of build time for RegisterCPU.cpp (a hopefully-representative example I picked to shorten my iteration cycle).

I've debug-gated it on the grounds that 1) we at least try to build
everything in debug mode and 2) optimized builds presumably take
longer in general, so we can more afford to pay the build time cost in
debug builds.

The win is not entirely clear; please see the test plan for details.

Differential Revision: [D26153793](https://our.internmc.facebook.com/intern/diff/D26153793/)

[ghstack-poisoned]
…FromUnboxedFunctor"

Templight said that this assertion was taking about 5% of build time for RegisterCPU.cpp (a hopefully-representative example I picked to shorten my iteration cycle).

I've debug-gated it on the grounds that 1) we at least try to build
everything in debug mode and 2) optimized builds presumably take
longer in general, so we can more afford to pay the build time cost in
debug builds.

The win is not entirely clear; please see the test plan for details.

Differential Revision: [D26153793](https://our.internmc.facebook.com/intern/diff/D26153793/)

[ghstack-poisoned]
… in KernelFunction::makeFromUnboxedFunctor"

Templight said that this assertion was taking about 5% of build time for RegisterCPU.cpp (a hopefully-representative example I picked to shorten my iteration cycle).

I've debug-gated it on the grounds that 1) we at least try to build
everything in debug mode and 2) optimized builds presumably take
longer in general, so we can more afford to pay the build time cost in
debug builds.

The win is not entirely clear; please see the test plan for details.

Differential Revision: [D26153793](https://our.internmc.facebook.com/intern/diff/D26153793/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request has been merged in c442776.

@facebook-github-bot facebook-github-bot deleted the gh/swolchok/104/head branch February 21, 2021 15:16
xsacha pushed a commit to xsacha/pytorch that referenced this pull request Mar 31, 2021
…Functor (pytorch#51367)

Summary:
Pull Request resolved: pytorch#51367

Templight said that this assertion was taking about 5% of build time for RegisterCPU.cpp (a hopefully-representative example I picked to shorten my iteration cycle).

I've debug-gated it on the grounds that 1) we at least try to build
everything in debug mode and 2) optimized builds presumably take
longer in general, so we can more afford to pay the build time cost in
debug builds.

The win is not entirely clear; please see the test plan for details.
ghstack-source-id: 121378960

Test Plan:
1) Built RegisterCPU.cpp with -ftime-trace before and after. It doesn't seem to call out any difference in the details, but the overall time is stably down more like 10% (55s before and 49s after).
2) Did a full rebuild of aten-cpu with -ftime-trace before and
after. No significant difference in build times shown (it says *after*
is a regression, but it's using wall-time data and the machine is
loaded during builds so there's some noise).
3) Re-profiled with Templight.

Before:

{F366557311}

After:

{F366557501}

Not sure what to conclude overall. A known problem with templight is that template instantiations form more of a dependency graph than a tree because they're cached internally, so eliminating the first caller of a template may just move the time to another caller. However, it looks like we have actually reduced is_functor traffic.

UPDATE: I don't think that the -ftime-trace measurement was reliable; it seems to skew running times. I built this diff vs its base 5 times and measured the CPU ("user") time each time. Results (in seconds):

previous diff: [51.97, 50.54, 50.49, 52.89, 51.61]
mean: 51.5 std: 0.906

this diff: [50.53, 50.41, 50.57, 50.67, 50.94]
mean: 50.6 std: 0.179

Reviewed By: ezyang

Differential Revision: D26153793

fbshipit-source-id: 9a66912c1b2b068f453e78be57454e4e62b7107b
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
…Functor (pytorch#51367)

Summary:
Pull Request resolved: pytorch#51367

Templight said that this assertion was taking about 5% of build time for RegisterCPU.cpp (a hopefully-representative example I picked to shorten my iteration cycle).

I've debug-gated it on the grounds that 1) we at least try to build
everything in debug mode and 2) optimized builds presumably take
longer in general, so we can more afford to pay the build time cost in
debug builds.

The win is not entirely clear; please see the test plan for details.
ghstack-source-id: 121378960

Test Plan:
1) Built RegisterCPU.cpp with -ftime-trace before and after. It doesn't seem to call out any difference in the details, but the overall time is stably down more like 10% (55s before and 49s after).
2) Did a full rebuild of aten-cpu with -ftime-trace before and
after. No significant difference in build times shown (it says *after*
is a regression, but it's using wall-time data and the machine is
loaded during builds so there's some noise).
3) Re-profiled with Templight.

Before:

{F366557311}

After:

{F366557501}

Not sure what to conclude overall. A known problem with templight is that template instantiations form more of a dependency graph than a tree because they're cached internally, so eliminating the first caller of a template may just move the time to another caller. However, it looks like we have actually reduced is_functor traffic.

UPDATE: I don't think that the -ftime-trace measurement was reliable; it seems to skew running times. I built this diff vs its base 5 times and measured the CPU ("user") time each time. Results (in seconds):

previous diff: [51.97, 50.54, 50.49, 52.89, 51.61]
mean: 51.5 std: 0.906

this diff: [50.53, 50.41, 50.57, 50.67, 50.94]
mean: 50.6 std: 0.179

Reviewed By: ezyang

Differential Revision: D26153793

fbshipit-source-id: 9a66912c1b2b068f453e78be57454e4e62b7107b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants