Skip to content

Expose CUDACachingAllocator raw_alloc and raw_delete to python#33860

Closed
emcastillo wants to merge 1 commit intopytorch:masterfrom
emcastillo:python_alloc
Closed

Expose CUDACachingAllocator raw_alloc and raw_delete to python#33860
emcastillo wants to merge 1 commit intopytorch:masterfrom
emcastillo:python_alloc

Conversation

@emcastillo
Copy link
Copy Markdown
Collaborator

This PR aims to improve the interoperability with CuPy.

Instead of having two separate and conflicting memory pools. With this PR, CuPy can directly alloc memory from the PyTorch allocator by means of this proposal cupy/cupy#3126

We would like to gather feedback to know if this approach makes sense for PyTorch, or other alternative designs.

@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented Feb 27, 2020

💊 CircleCI build failures summary and remediations

As of commit 42e6f50 (more details on the Dr. CI page):


Commit 42e6f50 was recently pushed. Waiting for builds...


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 8 times.

@ailzhang ailzhang requested review from mruberry and ngimel February 27, 2020 17:37
@ailzhang ailzhang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 27, 2020
@ailzhang
Copy link
Copy Markdown
Contributor

@ngimel @mruberry Would you mind help reviewing this PR?
(Please feel free to remove if it's not a good fit)

@mruberry mruberry removed their request for review February 27, 2020 19:06
@mruberry
Copy link
Copy Markdown
Collaborator

fyi @csarofeen and @ptrblck

@ngimel
Copy link
Copy Markdown
Collaborator

ngimel commented Feb 27, 2020

cc @mcarilli. I'm concerned about how cupy is going to handle streams. Memory allocated by pytorch is safe to use only on the stream it is allocated on, but since cupy and pytorch don't share cuda stream pools (?) I don't see then how this memory can be used safely without additional synchronizations. Do we expect users to put in all the necessary synchronizations?

@ngimel ngimel requested a review from ezyang February 27, 2020 22:53
@mcarilli
Copy link
Copy Markdown
Collaborator

mcarilli commented Feb 28, 2020

Taking memory from the Pytorch caching allocator and handing it to some other library seems hard, and certainly isn't safe as implemented here, unless the recipient (Cupy) happens to use that memory on the same stream that the Pytorch allocator associated with the memory.

@ngimel @mruberry stating stuff you know so it's in the open, there are two issues:

  • syncing the recipient's intended usage stream(s) with the stream on which Pytorch most recently populated the memory
  • telling Pytorch to recordStream(memory, usage streams) so that the caching allocator doesn't free and reuse that same memory until the recipient's work is done.

In principle such usage could be made safe, if the recipient either communicated back to Pytorch (in a way Pytorch could understand) on what stream(s) the recipient intended to use the memory (at which point Pytorch could call recordStream(memory, streams)) OR the recipient queried from Pytorch (in a way the recipient could understand) a list of streams on which it was safe (according to Pytorch) to use the memory.

@emcastillo
Copy link
Copy Markdown
Collaborator Author

emcastillo commented Feb 28, 2020

Hello, and thanks for all the feedback.
I am not sure which would be the better way to achieve this, probably allowing to use the underlying PyTorch streams as cupy streams could be an easy solution?.
It would require changes in cupy and the user should be aware of it.
We would like to avoid having a dependency between both libraries by using generic interfaces.

Another possible and simpler solution could be to pass the current stream being used in CuPy to the memory allocator?

The discussion is also on-going at cupy/cupy#3126
Sorry for scattering it.

@emcastillo
Copy link
Copy Markdown
Collaborator Author

I just added a PR in CuPY to allow it using external streams. So that a PyTorch managed stream can be set as the CuPy default one and events, kernels, and memory managed through that one.
I will modify this PR to explicitly tell the stream when allocating memory. This should make the approach safe.

@emcastillo
Copy link
Copy Markdown
Collaborator Author

I added the option to specify a stream.
As I said above, streams from the PyTorch pool can be shared to CuPy using cupy/cupy#3141
It is the user responsibility to use the pytorch streams in CuPy.

The following gist creates and sets CuPy streams based on the torch ones and makes a custom allocator aware of the current stream.

https://gist.github.com/emcastillo/44a033399ad67ddbdb306bed0f5fa6e0

I am not sure if this proposal makes 100% sense for you guys so any feedback is more than welcome.

@ezyang ezyang removed their request for review March 2, 2020 18:13
@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Mar 2, 2020

Removing myself from reviewer list, lmk if I need to look.

Comment thread torch/csrc/cuda/Module.cpp Outdated
Comment thread c10/cuda/CUDACachingAllocator.cpp Outdated
@emcastillo emcastillo force-pushed the python_alloc branch 2 times, most recently from b7e3430 to 3618025 Compare March 3, 2020 03:47
Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@ngimel merged this pull request in 31cc311.

@emcastillo emcastillo deleted the python_alloc branch March 4, 2020 05:23
ttumiel pushed a commit to ttumiel/pytorch that referenced this pull request Mar 4, 2020
…ytorch#33860)

Summary:
This PR aims to improve the interoperability with [CuPy](https://github.com/cupy/cupy/pulls).

Instead of having two separate and conflicting memory pools. With this PR, CuPy can directly alloc memory from the PyTorch allocator by means of this proposal cupy/cupy#3126

We would like to gather feedback to know if this approach makes sense for PyTorch, or other alternative designs.
Pull Request resolved: pytorch#33860

Differential Revision: D20212788

Pulled By: ngimel

fbshipit-source-id: bc1e08a66da1992d26021147bf645dc65239581c
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
…ytorch#33860)

Summary:
This PR aims to improve the interoperability with [CuPy](https://github.com/cupy/cupy/pulls).

Instead of having two separate and conflicting memory pools. With this PR, CuPy can directly alloc memory from the PyTorch allocator by means of this proposal cupy/cupy#3126

We would like to gather feedback to know if this approach makes sense for PyTorch, or other alternative designs.
Pull Request resolved: pytorch#33860

Differential Revision: D20212788

Pulled By: ngimel

fbshipit-source-id: bc1e08a66da1992d26021147bf645dc65239581c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants