Expose CUDACachingAllocator raw_alloc and raw_delete to python#33860
Expose CUDACachingAllocator raw_alloc and raw_delete to python#33860emcastillo wants to merge 1 commit intopytorch:masterfrom
CUDACachingAllocator raw_alloc and raw_delete to python#33860Conversation
💊 CircleCI build failures summary and remediationsAs of commit 42e6f50 (more details on the Dr. CI page): Commit 42e6f50 was recently pushed. Waiting for builds... This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 8 times. |
|
fyi @csarofeen and @ptrblck |
|
cc @mcarilli. I'm concerned about how cupy is going to handle streams. Memory allocated by pytorch is safe to use only on the stream it is allocated on, but since cupy and pytorch don't share cuda stream pools (?) I don't see then how this memory can be used safely without additional synchronizations. Do we expect users to put in all the necessary synchronizations? |
|
Taking memory from the Pytorch caching allocator and handing it to some other library seems hard, and certainly isn't safe as implemented here, unless the recipient (Cupy) happens to use that memory on the same stream that the Pytorch allocator associated with the memory. @ngimel @mruberry stating stuff you know so it's in the open, there are two issues:
In principle such usage could be made safe, if the recipient either communicated back to Pytorch (in a way Pytorch could understand) on what stream(s) the recipient intended to use the memory (at which point Pytorch could call recordStream(memory, streams)) OR the recipient queried from Pytorch (in a way the recipient could understand) a list of streams on which it was safe (according to Pytorch) to use the memory. |
|
Hello, and thanks for all the feedback. Another possible and simpler solution could be to pass the current stream being used in CuPy to the memory allocator? The discussion is also on-going at cupy/cupy#3126 |
|
I just added a PR in CuPY to allow it using external streams. So that a PyTorch managed stream can be set as the CuPy default one and events, kernels, and memory managed through that one. |
|
I added the option to specify a stream. The following gist creates and sets CuPy streams based on the torch ones and makes a custom allocator aware of the current stream. https://gist.github.com/emcastillo/44a033399ad67ddbdb306bed0f5fa6e0 I am not sure if this proposal makes 100% sense for you guys so any feedback is more than welcome. |
|
Removing myself from reviewer list, lmk if I need to look. |
b7e3430 to
3618025
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
…ytorch#33860) Summary: This PR aims to improve the interoperability with [CuPy](https://github.com/cupy/cupy/pulls). Instead of having two separate and conflicting memory pools. With this PR, CuPy can directly alloc memory from the PyTorch allocator by means of this proposal cupy/cupy#3126 We would like to gather feedback to know if this approach makes sense for PyTorch, or other alternative designs. Pull Request resolved: pytorch#33860 Differential Revision: D20212788 Pulled By: ngimel fbshipit-source-id: bc1e08a66da1992d26021147bf645dc65239581c
…ytorch#33860) Summary: This PR aims to improve the interoperability with [CuPy](https://github.com/cupy/cupy/pulls). Instead of having two separate and conflicting memory pools. With this PR, CuPy can directly alloc memory from the PyTorch allocator by means of this proposal cupy/cupy#3126 We would like to gather feedback to know if this approach makes sense for PyTorch, or other alternative designs. Pull Request resolved: pytorch#33860 Differential Revision: D20212788 Pulled By: ngimel fbshipit-source-id: bc1e08a66da1992d26021147bf645dc65239581c
This PR aims to improve the interoperability with CuPy.
Instead of having two separate and conflicting memory pools. With this PR, CuPy can directly alloc memory from the PyTorch allocator by means of this proposal cupy/cupy#3126
We would like to gather feedback to know if this approach makes sense for PyTorch, or other alternative designs.