Move function deletion from the stack to the heap. by resistor · Pull Request #11534 · pytorch/pytorch

resistor · 2018-09-11T19:02:20Z

This eliminates the need for any heuristics regarding stack size limits.

resistor · 2018-09-11T19:02:30Z

zou3519 · 2018-09-11T19:09:58Z

We should benchmark how much additional time this takes; I'm not sure how expensive it is to do a full traversal of the computation graph

resistor · 2018-09-11T19:23:58Z

It’s doing the same amount of traversal, just using the heap instead of the program stack.

zou3519 · 2018-09-11T20:29:52Z

ah I see, my bad

zou3519

This looks correct to me (and is much more elegant than what we had before). Some before/after numbers would still be nice just in case because deleting computation graphs happens all the time

resistor · 2018-09-11T22:05:52Z

@zou3519 Can you recommend a test case?

facebook-github-bot

resistor has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zou3519 · 2018-09-11T22:12:01Z

Maybe this: https://gist.github.com/zou3519/93cf84d96ae431356ae7f7c1923ef51a

apaszke

I'd rather vote for making the recursion limit much smaller (think ~500), but still prefer the stack to heap allocation. It's extremely unlikely that someone will get an overflow with that many frames.

zou3519 · 2018-09-11T22:17:24Z

I remember making the recursion limit smaller added overhead in the 1 - 1000 milliseconds range but I haven't done the numbers in a while. I am down for making the recursion limit much smaller though if you think the stack allocation is better than heap allocation @apaszke

resistor · 2018-09-11T22:18:01Z

@apaszke Why?

This is a very common pattern in compiler-like systems, where recursion over program representations ends up breaking some day on an unexpectedly large program. Programmers aren't in the habit of listening to compiler authors about what a "reasonable" size for a program is.

apaszke · 2018-09-11T22:38:48Z

This is a very common pattern in compiler-like systems

Which is reasonable, because those run either ahead of time, or in a background thread (in case of JITs). In this case, the destruction is blocking the main program execution, so it should be as fast as it is possible.

resistor · 2018-09-11T22:48:18Z

@apaszke Speed is of no value if it crashes. And plenty of JITs block program execution on compilation.

I'm also not sure why you would assume the prior version is faster. The memory locality of using vectors will generally be better than using deques like the current version does, and that is borne out in measurement. Decreasing the recursion threshold will only make the difference more pronounced by moving more work from the stack onto the deque.

WITH PR

Time to build graph (s):
0.09391583776474
Time to free graph (s):
0.019444111347198485

Time to build graph (s):
0.0975230975151062
Time to free graph (s):
0.02031881332397461

ON MASTER

Time to build graph (s):
0.09743225431442261
Time to free graph (s):
0.024351720809936524

Time to build graph (s):
0.09882893228530884
Time to free graph (s):
0.02464564323425293

apaszke · 2018-09-12T00:49:40Z

Speed is of no value if it crashes.

Lack of (deterministically stack-overflow triggered) crashes in 0.01% of use cases is of no value if your library is 2x slower than other libraries in most other cases 😉

And plenty of JITs block program execution on compilation.

Fine, but this also happen once (or a few times), no matter how long does your program runs. This destruction happens at every training step, over the whole duration of the program.

I didn't notice you change the dequeue for a vector. That sounds like a good idea, and would be a good improvement even for the older code, so it might be worth re-benchmarking against master + vector. Also, can you please post the same benchmark, but with a graph of low depth (to check how does the stack strategy perform compared to always using the heap).

torch/csrc/autograd/function.cpp

ezyang

I like killing heuristics. I would like it if some of the nit comments were addressed though.

resistor · 2018-09-12T03:17:45Z

@ezyang Nits should be fixed in latest version. Let me know if you have further feedback.

ezyang · 2018-09-12T03:36:01Z

@pytorchbot retest this please

facebook-github-bot

resistor has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

resistor · 2018-09-12T03:42:12Z

Here are timings using the same script with 100x smaller workload:

ON PR

Time to build graph (s):
0.0010059714317321778
Time to free graph (s):
0.00016424322128295898

Time to build graph (s):
0.0009362959861755371
Time to free graph (s):
0.0001583690643310547

ON MASTER

Time to build graph (s):
0.0009480509757995606
Time to free graph (s):
0.00014049625396728515

Time to build graph (s):
0.0010006918907165528
Time to free graph (s):
0.00014899682998657226

facebook-github-bot

resistor has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

apaszke · 2018-09-12T14:18:32Z

Well it is a 6% regression, which is not great tbh. Any impact on the actual training scripts? Can you try world language model without cuDNN + a ResNet50?

This eliminates the need for any heuristics regarding stack size limits.

resistor · 2018-09-12T18:08:03Z

I implemented some improvements that reduce the small workload overhead, such that it is no longer a regression in this case.

Time to build graph (s):
0.000943394660949707
Time to free graph (s):
0.00013764524459838868

Time to build graph (s):
0.0009673638343811035
Time to free graph (s):
0.00013594341278076172

resistor · 2018-09-12T18:16:46Z

For clarity, the changes that sped it up were eliminating an unnecessary move in the primary deleter loop, and making the gather function more aggressive in pruning out functions that don't need to be gathered.

I think there may be room to make it even faster by making the gather look through the graph iteratively, but that will be more complex to engineer.

apaszke

Was the optimization to check use_count in gatherFunctions?

resistor · 2018-09-12T18:19:04Z

That plus eliminating a std::move in the deleter loop.

facebook-github-bot

resistor has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

torch/csrc/autograd/function.cpp

+    auto& curr_func = stack.back();

-  delete function;
+    if (curr_func.use_count() == 1) {


This eliminates the need for any heuristics regarding stack size limits. This is a re-do pytorch#11534 with a fix to properly handle cases where multiple edges exist between a pair of functions.

Summary: This eliminates the need for any heuristics regarding stack size limits. This is a re-do #11534 with a fix to properly handle cases where multiple edges exist between a pair of functions. Pull Request resolved: #11611 Differential Revision: D9991198 Pulled By: resistor fbshipit-source-id: fecd2c5cac7e78f82a0f20cf33268bb1617bb4a0

resistor requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners September 11, 2018 19:02

resistor force-pushed the deleter branch 2 times, most recently from f1aeaa0 to ffdd7df Compare September 11, 2018 20:24

zou3519 approved these changes Sep 11, 2018

View reviewed changes

facebook-github-bot reviewed Sep 11, 2018

View reviewed changes

apaszke reviewed Sep 11, 2018

View reviewed changes

ezyang reviewed Sep 12, 2018

View reviewed changes

torch/csrc/autograd/function.cpp Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

ezyang reviewed Sep 12, 2018

View reviewed changes

torch/csrc/autograd/function.cpp Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

ezyang approved these changes Sep 12, 2018

View reviewed changes

resistor force-pushed the deleter branch from ffdd7df to 12d8dce Compare September 12, 2018 03:16

facebook-github-bot reviewed Sep 12, 2018

View reviewed changes

Move function deletion from the stack to the heap.

9115812

This eliminates the need for any heuristics regarding stack size limits.

resistor force-pushed the deleter branch from 814017c to 9115812 Compare September 12, 2018 18:06

apaszke approved these changes Sep 12, 2018

View reviewed changes

facebook-github-bot reviewed Sep 12, 2018

View reviewed changes

facebook-github-bot closed this in d4e05f4 Sep 12, 2018

ssnl reviewed Sep 12, 2018

View reviewed changes

torch/csrc/autograd/function.cpp

auto& curr_func = stack.back();

delete function;

if (curr_func.use_count() == 1) {

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

resistor mentioned this pull request Sep 12, 2018

Move function deletion from the stack to the heap. #11611

Closed

ezyang added the merged label Jun 26, 2019

Conversation

resistor commented Sep 11, 2018

Uh oh!

resistor commented Sep 11, 2018

Uh oh!

zou3519 commented Sep 11, 2018

Uh oh!

resistor commented Sep 11, 2018

Uh oh!

zou3519 commented Sep 11, 2018

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

resistor commented Sep 11, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

zou3519 commented Sep 11, 2018

Uh oh!

apaszke left a comment

Choose a reason for hiding this comment

Uh oh!

zou3519 commented Sep 11, 2018

Uh oh!

resistor commented Sep 11, 2018

Uh oh!

apaszke commented Sep 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

resistor commented Sep 11, 2018

WITH PR

ON MASTER

Uh oh!

apaszke commented Sep 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

resistor commented Sep 12, 2018

Uh oh!

ezyang commented Sep 12, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

resistor commented Sep 12, 2018

ON PR

ON MASTER

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

apaszke commented Sep 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

resistor commented Sep 12, 2018

Uh oh!

resistor commented Sep 12, 2018

Uh oh!

apaszke left a comment

Choose a reason for hiding this comment

Uh oh!

resistor commented Sep 12, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

apaszke commented Sep 11, 2018 •

edited

Loading

apaszke commented Sep 12, 2018 •

edited

Loading

apaszke commented Sep 12, 2018 •

edited

Loading