[Static Runtime] Fix MemoryPlanner dtor crash in debug mode#79942
[Static Runtime] Fix MemoryPlanner dtor crash in debug mode#79942mikeiovine wants to merge 1 commit intopytorch:masterfrom
Conversation
🔗 Helpful links
✅ No Failures (0 Pending)As of commit 3f9c479 (more details on the Dr. CI page): Expand to see more💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
|
This pull request was exported from Phabricator. Differential Revision: D37303728 |
…79942) Summary: Pull Request resolved: pytorch#79942 Memory planner destruction was hitting [this assertion](https://www.internalfb.com/code/fbsource/[f8baf8a0bab462c860d2eb7491a4e3fb40d2907a]/fbcode/caffe2/c10/util/intrusive_ptr.h?lines=117) in debug mode for a few models. Here's what was going on: 1) The set of unmanaged `IValue`s acquires one or more owning refs of a managed `StorageImpl` 2) Then, one or more tensors in that storage group have their `StorageImpl` swapped out during execution 3) During `deallocateManagedTensors`, we swap the correct `StorageImpl` back in, [calling `unsafe_adapt_non_heap_allocated` again and resetting the refcount](https://www.internalfb.com/code/fbsource/[f8baf8a0bab462c860d2eb7491a4e3fb40d2907a]/fbcode/caffe2/torch/csrc/jit/runtime/static/memory_planner.cpp?lines=446-452) 4) The unmanaged `IValues` are deallocated, decrementing the refcount into the danger zone. So, we just have to make sure that unmanaged `IValue`s are destructed before we deallocate the managed tensors. Test Plan: CI Differential Revision: D37303728 fbshipit-source-id: dffc079c84dc92b5dad44758cb2bafbf60dc0d0e
|
This pull request was exported from Phabricator. Differential Revision: D37303728 |
298674f to
3f9c479
Compare
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
|
@pytorchbot successfully started a merge job. Check the current status here |
|
@mikeiovine your PR has been successfully merged. |
|
Hey @mikeiovine. |
Summary: Pull Request resolved: #79942 Memory planner destruction was hitting [this assertion](https://www.internalfb.com/code/fbsource/[f8baf8a0bab462c860d2eb7491a4e3fb40d2907a]/fbcode/caffe2/c10/util/intrusive_ptr.h?lines=117) in debug mode for a few models. Here's what was going on: 1) The set of unmanaged `IValue`s acquires one or more owning refs of a managed `StorageImpl` 2) Then, one or more tensors in that storage group have their `StorageImpl` swapped out during execution 3) During `deallocateManagedTensors`, we swap the correct `StorageImpl` back in, [calling `unsafe_adapt_non_heap_allocated` again and resetting the refcount](https://www.internalfb.com/code/fbsource/[f8baf8a0bab462c860d2eb7491a4e3fb40d2907a]/fbcode/caffe2/torch/csrc/jit/runtime/static/memory_planner.cpp?lines=446-452) 4) The unmanaged `IValues` are deallocated, decrementing the refcount into the danger zone. So, we just have to make sure that unmanaged `IValue`s are destructed before we deallocate the managed tensors. Test Plan: CI Reviewed By: ajyu, tenpercent Differential Revision: D37303728 fbshipit-source-id: 82b58221d45ab04a30cb7358c8bbeb124f71129d
…79942) Summary: Memory planner destruction was hitting [this assertion](https://www.internalfb.com/code/fbsource/[f8baf8a0bab462c860d2eb7491a4e3fb40d2907a]/fbcode/caffe2/c10/util/intrusive_ptr.h?lines=117) in debug mode for a few models. Here's what was going on: 1) The set of unmanaged `IValue`s acquires one or more owning refs of a managed `StorageImpl` 2) Then, one or more tensors in that storage group have their `StorageImpl` swapped out during execution 3) During `deallocateManagedTensors`, we swap the correct `StorageImpl` back in, [calling `unsafe_adapt_non_heap_allocated` again and resetting the refcount](https://www.internalfb.com/code/fbsource/[f8baf8a0bab462c860d2eb7491a4e3fb40d2907a]/fbcode/caffe2/torch/csrc/jit/runtime/static/memory_planner.cpp?lines=446-452) 4) The unmanaged `IValues` are deallocated, decrementing the refcount into the danger zone. So, we just have to make sure that unmanaged `IValue`s are destructed before we deallocate the managed tensors. Test Plan: CI Differential Revision: D37303728 Pull Request resolved: pytorch#79942 Approved by: https://github.com/tenpercent
…79942) Summary: Memory planner destruction was hitting [this assertion](https://www.internalfb.com/code/fbsource/[f8baf8a0bab462c860d2eb7491a4e3fb40d2907a]/fbcode/caffe2/c10/util/intrusive_ptr.h?lines=117) in debug mode for a few models. Here's what was going on: 1) The set of unmanaged `IValue`s acquires one or more owning refs of a managed `StorageImpl` 2) Then, one or more tensors in that storage group have their `StorageImpl` swapped out during execution 3) During `deallocateManagedTensors`, we swap the correct `StorageImpl` back in, [calling `unsafe_adapt_non_heap_allocated` again and resetting the refcount](https://www.internalfb.com/code/fbsource/[f8baf8a0bab462c860d2eb7491a4e3fb40d2907a]/fbcode/caffe2/torch/csrc/jit/runtime/static/memory_planner.cpp?lines=446-452) 4) The unmanaged `IValues` are deallocated, decrementing the refcount into the danger zone. So, we just have to make sure that unmanaged `IValue`s are destructed before we deallocate the managed tensors. Test Plan: CI Differential Revision: D37303728 Pull Request resolved: pytorch#79942 Approved by: https://github.com/tenpercent
Summary:
Memory planner destruction was hitting this assertion in debug mode for a few models.
Here's what was going on:
IValues acquires one or more owning refs of a managedStorageImplStorageImplswapped out during executiondeallocateManagedTensors, we swap the correctStorageImplback in, callingunsafe_adapt_non_heap_allocatedagain and resetting the refcountIValuesare deallocated, decrementing the refcount into the danger zone.So, we just have to make sure that unmanaged
IValues are destructed before we deallocate the managed tensors.Test Plan: CI
Differential Revision: D37303728