Expose `CUDACachingAllocator` `raw_alloc` and `raw_delete` to python by emcastillo · Pull Request #33860 · pytorch/pytorch

emcastillo · 2020-02-27T05:33:54Z

This PR aims to improve the interoperability with CuPy.

Instead of having two separate and conflicting memory pools. With this PR, CuPy can directly alloc memory from the PyTorch allocator by means of this proposal cupy/cupy#3126

We would like to gather feedback to know if this approach makes sense for PyTorch, or other alternative designs.

dr-ci · 2020-02-27T06:04:08Z

💊 CircleCI build failures summary and remediations

As of commit 42e6f50 (more details on the Dr. CI page):

Commit 42e6f50 was recently pushed. Waiting for builds...

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 8 times.

ailzhang · 2020-02-27T17:38:27Z

@ngimel @mruberry Would you mind help reviewing this PR?
(Please feel free to remove if it's not a good fit)

mruberry · 2020-02-27T19:06:42Z

fyi @csarofeen and @ptrblck

ngimel · 2020-02-27T20:41:25Z

cc @mcarilli. I'm concerned about how cupy is going to handle streams. Memory allocated by pytorch is safe to use only on the stream it is allocated on, but since cupy and pytorch don't share cuda stream pools (?) I don't see then how this memory can be used safely without additional synchronizations. Do we expect users to put in all the necessary synchronizations?

mcarilli · 2020-02-28T01:57:27Z

Taking memory from the Pytorch caching allocator and handing it to some other library seems hard, and certainly isn't safe as implemented here, unless the recipient (Cupy) happens to use that memory on the same stream that the Pytorch allocator associated with the memory.

@ngimel @mruberry stating stuff you know so it's in the open, there are two issues:

syncing the recipient's intended usage stream(s) with the stream on which Pytorch most recently populated the memory
telling Pytorch to recordStream(memory, usage streams) so that the caching allocator doesn't free and reuse that same memory until the recipient's work is done.

In principle such usage could be made safe, if the recipient either communicated back to Pytorch (in a way Pytorch could understand) on what stream(s) the recipient intended to use the memory (at which point Pytorch could call recordStream(memory, streams)) OR the recipient queried from Pytorch (in a way the recipient could understand) a list of streams on which it was safe (according to Pytorch) to use the memory.

emcastillo · 2020-02-28T02:05:22Z

Hello, and thanks for all the feedback.
I am not sure which would be the better way to achieve this, probably allowing to use the underlying PyTorch streams as cupy streams could be an easy solution?.
It would require changes in cupy and the user should be aware of it.
We would like to avoid having a dependency between both libraries by using generic interfaces.

Another possible and simpler solution could be to pass the current stream being used in CuPy to the memory allocator?

The discussion is also on-going at cupy/cupy#3126
Sorry for scattering it.

emcastillo · 2020-03-02T03:21:34Z

I just added a PR in CuPY to allow it using external streams. So that a PyTorch managed stream can be set as the CuPy default one and events, kernels, and memory managed through that one.
I will modify this PR to explicitly tell the stream when allocating memory. This should make the approach safe.

emcastillo · 2020-03-02T07:52:07Z

I added the option to specify a stream.
As I said above, streams from the PyTorch pool can be shared to CuPy using cupy/cupy#3141
It is the user responsibility to use the pytorch streams in CuPy.

The following gist creates and sets CuPy streams based on the torch ones and makes a custom allocator aware of the current stream.

https://gist.github.com/emcastillo/44a033399ad67ddbdb306bed0f5fa6e0

I am not sure if this proposal makes 100% sense for you guys so any feedback is more than welcome.

ezyang · 2020-03-02T18:13:48Z

Removing myself from reviewer list, lmk if I need to look.

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-03-04T05:15:57Z

@ngimel merged this pull request in 31cc311.

…ytorch#33860) Summary: This PR aims to improve the interoperability with [CuPy](https://github.com/cupy/cupy/pulls). Instead of having two separate and conflicting memory pools. With this PR, CuPy can directly alloc memory from the PyTorch allocator by means of this proposal cupy/cupy#3126 We would like to gather feedback to know if this approach makes sense for PyTorch, or other alternative designs. Pull Request resolved: pytorch#33860 Differential Revision: D20212788 Pulled By: ngimel fbshipit-source-id: bc1e08a66da1992d26021147bf645dc65239581c

pytorchbot added the open source label Feb 27, 2020

ailzhang requested review from mruberry and ngimel February 27, 2020 17:37

ailzhang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 27, 2020

mruberry removed their request for review February 27, 2020 19:06

ngimel requested a review from ezyang February 27, 2020 22:53

This was referenced Feb 28, 2020

Add PythonFunctionAllocator cupy/cupy#3126

Merged

Add cupy.cuda.ExternalStream cupy/cupy#3141

Merged

ezyang removed their request for review March 2, 2020 18:13

ngimel reviewed Mar 2, 2020

View reviewed changes

Comment thread torch/csrc/cuda/Module.cpp Outdated

ngimel reviewed Mar 2, 2020

View reviewed changes

Comment thread c10/cuda/CUDACachingAllocator.cpp Outdated

emcastillo force-pushed the python_alloc branch 2 times, most recently from b7e3430 to 3618025 Compare March 3, 2020 03:47

Expose CUDACachingAllocator to python

42e6f50

emcastillo force-pushed the python_alloc branch from 3618025 to 42e6f50 Compare March 3, 2020 03:49

ngimel approved these changes Mar 3, 2020

View reviewed changes

facebook-github-bot reviewed Mar 3, 2020

View reviewed changes

facebook-github-bot closed this in 31cc311 Mar 4, 2020

facebook-github-bot added the merged label Mar 4, 2020

emcastillo deleted the python_alloc branch March 4, 2020 05:23

mrshenli mentioned this pull request Mar 4, 2020

[pytorch][mobile] support for custom mobile build with dynamic dispatch #34055

Closed

jakirkham mentioned this pull request Aug 17, 2020

[FEA] PyTorch and RMM sharing memory pool rapidsai/rmm#501

Closed

leofang mentioned this pull request Aug 27, 2020

Using external memory allocator with PyTorch #43144

Closed

mruberry added the Merged label Oct 28, 2020

leofang mentioned this pull request Dec 17, 2021

torch.cuda.caching_allocator_alloc and torch.cuda.caching_allocator_delete are undocumented #70117

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose `CUDACachingAllocator` `raw_alloc` and `raw_delete` to python#33860

Expose `CUDACachingAllocator` `raw_alloc` and `raw_delete` to python#33860
emcastillo wants to merge 1 commit intopytorch:masterfrom
emcastillo:python_alloc

emcastillo commented Feb 27, 2020

Uh oh!

dr-ci Bot commented Feb 27, 2020 •

edited

Loading

Uh oh!

ailzhang commented Feb 27, 2020

Uh oh!

mruberry commented Feb 27, 2020

Uh oh!

ngimel commented Feb 27, 2020

Uh oh!

mcarilli commented Feb 28, 2020 •

edited

Loading

Uh oh!

emcastillo commented Feb 28, 2020 •

edited

Loading

Uh oh!

emcastillo commented Mar 2, 2020

Uh oh!

emcastillo commented Mar 2, 2020

Uh oh!

ezyang commented Mar 2, 2020

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot commented Mar 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

emcastillo commented Feb 27, 2020

Uh oh!

dr-ci Bot commented Feb 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CircleCI build failures summary and remediations

Uh oh!

ailzhang commented Feb 27, 2020

Uh oh!

mruberry commented Feb 27, 2020

Uh oh!

ngimel commented Feb 27, 2020

Uh oh!

mcarilli commented Feb 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emcastillo commented Feb 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emcastillo commented Mar 2, 2020

Uh oh!

emcastillo commented Mar 2, 2020

Uh oh!

ezyang commented Mar 2, 2020

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Mar 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

dr-ci Bot commented Feb 27, 2020 •

edited

Loading

mcarilli commented Feb 28, 2020 •

edited

Loading

emcastillo commented Feb 28, 2020 •

edited

Loading