[Static Runtime] Fix MemoryPlanner dtor crash in debug mode by mikeiovine · Pull Request #79942 · pytorch/pytorch

mikeiovine · 2022-06-21T14:48:38Z

Summary:
Memory planner destruction was hitting this assertion in debug mode for a few models.

Here's what was going on:

The set of unmanaged IValues acquires one or more owning refs of a managed StorageImpl
Then, one or more tensors in that storage group have their StorageImpl swapped out during execution
During deallocateManagedTensors, we swap the correct StorageImpl back in, calling unsafe_adapt_non_heap_allocated again and resetting the refcount
The unmanaged IValues are deallocated, decrementing the refcount into the danger zone.

So, we just have to make sure that unmanaged IValues are destructed before we deallocate the managed tensors.

Test Plan: CI

Differential Revision: D37303728

facebook-github-bot · 2022-06-21T14:48:44Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/79942
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (0 Pending)

As of commit 3f9c479 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

facebook-github-bot · 2022-06-21T14:48:59Z

This pull request was exported from Phabricator. Differential Revision: D37303728

…79942) Summary: Pull Request resolved: pytorch#79942 Memory planner destruction was hitting [this assertion](https://www.internalfb.com/code/fbsource/[f8baf8a0bab462c860d2eb7491a4e3fb40d2907a]/fbcode/caffe2/c10/util/intrusive_ptr.h?lines=117) in debug mode for a few models. Here's what was going on: 1) The set of unmanaged `IValue`s acquires one or more owning refs of a managed `StorageImpl` 2) Then, one or more tensors in that storage group have their `StorageImpl` swapped out during execution 3) During `deallocateManagedTensors`, we swap the correct `StorageImpl` back in, [calling `unsafe_adapt_non_heap_allocated` again and resetting the refcount](https://www.internalfb.com/code/fbsource/[f8baf8a0bab462c860d2eb7491a4e3fb40d2907a]/fbcode/caffe2/torch/csrc/jit/runtime/static/memory_planner.cpp?lines=446-452) 4) The unmanaged `IValues` are deallocated, decrementing the refcount into the danger zone. So, we just have to make sure that unmanaged `IValue`s are destructed before we deallocate the managed tensors. Test Plan: CI Differential Revision: D37303728 fbshipit-source-id: dffc079c84dc92b5dad44758cb2bafbf60dc0d0e

facebook-github-bot · 2022-06-21T15:37:48Z

This pull request was exported from Phabricator. Differential Revision: D37303728

facebook-github-bot · 2022-06-21T21:06:33Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2022-06-21T21:07:55Z

@pytorchbot successfully started a merge job. Check the current status here

pytorchmergebot · 2022-06-21T21:08:01Z

@mikeiovine your PR has been successfully merged.

github-actions · 2022-06-21T21:08:33Z

Hey @mikeiovine.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

Summary: Pull Request resolved: #79942 Memory planner destruction was hitting [this assertion](https://www.internalfb.com/code/fbsource/[f8baf8a0bab462c860d2eb7491a4e3fb40d2907a]/fbcode/caffe2/c10/util/intrusive_ptr.h?lines=117) in debug mode for a few models. Here's what was going on: 1) The set of unmanaged `IValue`s acquires one or more owning refs of a managed `StorageImpl` 2) Then, one or more tensors in that storage group have their `StorageImpl` swapped out during execution 3) During `deallocateManagedTensors`, we swap the correct `StorageImpl` back in, [calling `unsafe_adapt_non_heap_allocated` again and resetting the refcount](https://www.internalfb.com/code/fbsource/[f8baf8a0bab462c860d2eb7491a4e3fb40d2907a]/fbcode/caffe2/torch/csrc/jit/runtime/static/memory_planner.cpp?lines=446-452) 4) The unmanaged `IValues` are deallocated, decrementing the refcount into the danger zone. So, we just have to make sure that unmanaged `IValue`s are destructed before we deallocate the managed tensors. Test Plan: CI Reviewed By: ajyu, tenpercent Differential Revision: D37303728 fbshipit-source-id: 82b58221d45ab04a30cb7358c8bbeb124f71129d

…79942) Summary: Memory planner destruction was hitting [this assertion](https://www.internalfb.com/code/fbsource/[f8baf8a0bab462c860d2eb7491a4e3fb40d2907a]/fbcode/caffe2/c10/util/intrusive_ptr.h?lines=117) in debug mode for a few models. Here's what was going on: 1) The set of unmanaged `IValue`s acquires one or more owning refs of a managed `StorageImpl` 2) Then, one or more tensors in that storage group have their `StorageImpl` swapped out during execution 3) During `deallocateManagedTensors`, we swap the correct `StorageImpl` back in, [calling `unsafe_adapt_non_heap_allocated` again and resetting the refcount](https://www.internalfb.com/code/fbsource/[f8baf8a0bab462c860d2eb7491a4e3fb40d2907a]/fbcode/caffe2/torch/csrc/jit/runtime/static/memory_planner.cpp?lines=446-452) 4) The unmanaged `IValues` are deallocated, decrementing the refcount into the danger zone. So, we just have to make sure that unmanaged `IValue`s are destructed before we deallocate the managed tensors. Test Plan: CI Differential Revision: D37303728 Pull Request resolved: pytorch#79942 Approved by: https://github.com/tenpercent

facebook-github-bot added the cla signed label Jun 21, 2022

facebook-github-bot added oncall: jit Add this issue/PR to JIT oncall triage queue fb-exported labels Jun 21, 2022

mikeiovine force-pushed the export-D37303728 branch from 298674f to 3f9c479 Compare June 21, 2022 15:37

tenpercent self-requested a review June 21, 2022 17:08

tenpercent approved these changes Jun 21, 2022

View reviewed changes

pytorchmergebot added the Merged label Jun 21, 2022

pytorchmergebot closed this in e7ed017 Jun 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Static Runtime] Fix MemoryPlanner dtor crash in debug mode#79942

[Static Runtime] Fix MemoryPlanner dtor crash in debug mode#79942
mikeiovine wants to merge 1 commit intopytorch:masterfrom
mikeiovine:export-D37303728

mikeiovine commented Jun 21, 2022

Uh oh!

facebook-github-bot commented Jun 21, 2022 •

edited

Loading

Uh oh!

facebook-github-bot commented Jun 21, 2022

Uh oh!

facebook-github-bot commented Jun 21, 2022

Uh oh!

facebook-github-bot commented Jun 21, 2022

Uh oh!

pytorchmergebot commented Jun 21, 2022

Uh oh!

pytorchmergebot commented Jun 21, 2022

Uh oh!

github-actions Bot commented Jun 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mikeiovine commented Jun 21, 2022

Uh oh!

facebook-github-bot commented Jun 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

✅ No Failures (0 Pending)

Uh oh!

facebook-github-bot commented Jun 21, 2022

Uh oh!

facebook-github-bot commented Jun 21, 2022

Uh oh!

facebook-github-bot commented Jun 21, 2022

Uh oh!

pytorchmergebot commented Jun 21, 2022

Uh oh!

pytorchmergebot commented Jun 21, 2022

Uh oh!

github-actions Bot commented Jun 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

facebook-github-bot commented Jun 21, 2022 •

edited

Loading