Uses memory pools for mixing CUDA allocators by syed-ahmed · Pull Request #125722 · pytorch/pytorch

syed-ahmed · 2024-05-07T23:52:50Z

Implements #124807

cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @penguinwu @tianyu-l @yf225 @chauhang

pytorch-bot · 2024-05-07T23:52:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125722

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 66 New Failures, 5 Unrelated Failures

As of commit 29d15bd with merge base e8e327b ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner-noclang / linux-job (gh)
>>> Lint for torch/cuda/mem_pool.py:
pull / linux-docs / build-docs-functorch-false (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_C.cpython-38-x86_64-linux-gnu.so)
pull / linux-docs / build-docs-python-false (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_C.cpython-38-x86_64-linux-gnu.so)
pull / linux-focal-cuda11.8-py3.10-gcc9 / test (distributed, 1, 3, linux.8xlarge.nvidia.gpu) (gh)
Process completed with exit code 1.
pull / linux-focal-cuda11.8-py3.10-gcc9 / test (distributed, 2, 3, linux.8xlarge.nvidia.gpu) (gh)
Process completed with exit code 1.
pull / linux-focal-cuda11.8-py3.10-gcc9 / test (distributed, 3, 3, linux.8xlarge.nvidia.gpu) (gh)
Process completed with exit code 1.
pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 1, 5, linux.4xlarge.nvidia.gpu) (gh)
Process completed with exit code 1.
pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 2, 5, linux.4xlarge.nvidia.gpu) (gh)
Process completed with exit code 1.
pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 3, 5, linux.4xlarge.nvidia.gpu) (gh)
Process completed with exit code 1.
pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 4, 5, linux.4xlarge.nvidia.gpu) (gh)
Process completed with exit code 1.
pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 5, 5, linux.4xlarge.nvidia.gpu) (gh)
Process completed with exit code 1.
pull / linux-focal-cuda12.1-py3.10-gcc9 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu) (gh)
Process completed with exit code 1.
pull / linux-focal-cuda12.1-py3.10-gcc9-bazel-test / build-and-test (default, 1, 1, linux.4xlarge.nvidia.gpu) (gh)
Build completed, 1 test FAILED, 3226 total actions
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 1, 5, linux.g5.4xlarge.nvidia.gpu) (gh)
Process completed with exit code 1.
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 2, 5, linux.g5.4xlarge.nvidia.gpu) (gh)
Process completed with exit code 1.
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 3, 5, linux.g5.4xlarge.nvidia.gpu) (gh)
Process completed with exit code 1.
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 4, 5, linux.g5.4xlarge.nvidia.gpu) (gh)
Process completed with exit code 1.
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 5, 5, linux.g5.4xlarge.nvidia.gpu) (gh)
Process completed with exit code 1.
pull / linux-focal-cuda12.4-py3.10-gcc9 / test (default, 1, 5, linux.4xlarge.nvidia.gpu) (gh)
Process completed with exit code 1.
pull / linux-focal-cuda12.4-py3.10-gcc9 / test (default, 2, 5, linux.4xlarge.nvidia.gpu) (gh)
Process completed with exit code 1.
pull / linux-focal-cuda12.4-py3.10-gcc9 / test (default, 3, 5, linux.4xlarge.nvidia.gpu) (gh)
Process completed with exit code 1.
pull / linux-focal-cuda12.4-py3.10-gcc9 / test (default, 4, 5, linux.4xlarge.nvidia.gpu) (gh)
Process completed with exit code 1.
pull / linux-focal-cuda12.4-py3.10-gcc9 / test (default, 5, 5, linux.4xlarge.nvidia.gpu) (gh)
Process completed with exit code 1.
pull / linux-focal-cuda12.4-py3.10-gcc9 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu) (gh)
Process completed with exit code 1.
pull / linux-focal-cuda12.4-py3.10-gcc9-bazel-test / build-and-test (default, 1, 1, linux.4xlarge.nvidia.gpu) (gh)
Build completed, 1 test FAILED, 3226 total actions
pull / linux-focal-py3_8-clang9-xla / test (xla, 1, 1, linux.12xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/lib/python3.8/site-packages/torch/_C.cpython-38-x86_64-linux-gnu.so)
pull / linux-focal-py3.11-clang10 / test (crossref, 1, 2, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.11/lib/python3.11/site-packages/torch/_C.cpython-311-x86_64-linux-gnu.so)
pull / linux-focal-py3.11-clang10 / test (crossref, 2, 2, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.11/lib/python3.11/site-packages/torch/_C.cpython-311-x86_64-linux-gnu.so)
pull / linux-focal-py3.11-clang10 / test (default, 1, 3, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.11/lib/python3.11/site-packages/torch/_C.cpython-311-x86_64-linux-gnu.so)
pull / linux-focal-py3.11-clang10 / test (default, 2, 3, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.11/lib/python3.11/site-packages/torch/_C.cpython-311-x86_64-linux-gnu.so)
pull / linux-focal-py3.11-clang10 / test (default, 3, 3, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.11/lib/python3.11/site-packages/torch/_C.cpython-311-x86_64-linux-gnu.so)
pull / linux-focal-py3.11-clang10 / test (dynamo, 1, 3, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.11/lib/python3.11/site-packages/torch/_C.cpython-311-x86_64-linux-gnu.so)
pull / linux-focal-py3.11-clang10 / test (dynamo, 2, 3, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.11/lib/python3.11/site-packages/torch/_C.cpython-311-x86_64-linux-gnu.so)
pull / linux-focal-py3.11-clang10 / test (dynamo, 3, 3, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.11/lib/python3.11/site-packages/torch/_C.cpython-311-x86_64-linux-gnu.so)
pull / linux-focal-py3.12-clang10 / test (default, 1, 3, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_C.cpython-312-x86_64-linux-gnu.so)
pull / linux-focal-py3.12-clang10 / test (default, 2, 3, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_C.cpython-312-x86_64-linux-gnu.so)
pull / linux-focal-py3.12-clang10 / test (default, 3, 3, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_C.cpython-312-x86_64-linux-gnu.so)
pull / linux-focal-py3.12-clang10 / test (dynamo, 1, 3, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_C.cpython-312-x86_64-linux-gnu.so)
pull / linux-focal-py3.12-clang10 / test (dynamo, 2, 3, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_C.cpython-312-x86_64-linux-gnu.so)
pull / linux-focal-py3.12-clang10 / test (dynamo, 3, 3, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_C.cpython-312-x86_64-linux-gnu.so)
pull / linux-focal-py3.8-clang10 / test (crossref, 1, 2, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_C.cpython-38-x86_64-linux-gnu.so)
pull / linux-focal-py3.8-clang10 / test (crossref, 2, 2, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_C.cpython-38-x86_64-linux-gnu.so)
pull / linux-focal-py3.8-clang10 / test (default, 1, 3, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_C.cpython-38-x86_64-linux-gnu.so)
pull / linux-focal-py3.8-clang10 / test (default, 2, 3, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_C.cpython-38-x86_64-linux-gnu.so)
pull / linux-focal-py3.8-clang10 / test (default, 3, 3, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_C.cpython-38-x86_64-linux-gnu.so)
pull / linux-focal-py3.8-clang10 / test (dynamo, 1, 3, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_C.cpython-38-x86_64-linux-gnu.so)
pull / linux-focal-py3.8-clang10 / test (dynamo, 2, 3, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_C.cpython-38-x86_64-linux-gnu.so)
pull / linux-focal-py3.8-clang10 / test (dynamo, 3, 3, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_C.cpython-38-x86_64-linux-gnu.so)
pull / linux-focal-py3.8-clang10-onnx / test (default, 1, 2, linux.2xlarge) (gh)
Process completed with exit code 1.
pull / linux-focal-py3.8-clang10-onnx / test (default, 2, 2, linux.2xlarge) (gh)
Process completed with exit code 1.
pull / linux-focal-rocm6.1-py3.8 / build (gh)
ninja: build stopped: subcommand failed
pull / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_C.cpython-310-x86_64-linux-gnu.so)
pull / linux-jammy-py3.10-clang15-asan / test (default, 1, 6, linux.4xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_C.cpython-310-x86_64-linux-gnu.so)
pull / linux-jammy-py3.10-clang15-asan / test (default, 2, 6, linux.4xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_C.cpython-310-x86_64-linux-gnu.so)
pull / linux-jammy-py3.10-clang15-asan / test (default, 3, 6, linux.4xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_C.cpython-310-x86_64-linux-gnu.so)
pull / linux-jammy-py3.10-clang15-asan / test (default, 4, 6, linux.4xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_C.cpython-310-x86_64-linux-gnu.so)
pull / linux-jammy-py3.10-clang15-asan / test (default, 5, 6, linux.4xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_C.cpython-310-x86_64-linux-gnu.so)
pull / linux-jammy-py3.10-clang15-asan / test (default, 6, 6, linux.4xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_C.cpython-310-x86_64-linux-gnu.so)
pull / linux-jammy-py3.8-gcc11 / test (backwards_compat, 1, 1, linux.2xlarge) (gh)
pull / linux-jammy-py3.8-gcc11 / test (default, 1, 3, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_C.cpython-38-x86_64-linux-gnu.so)
pull / linux-jammy-py3.8-gcc11 / test (default, 2, 3, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_C.cpython-38-x86_64-linux-gnu.so)
pull / linux-jammy-py3.8-gcc11 / test (default, 3, 3, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_C.cpython-38-x86_64-linux-gnu.so)
pull / linux-jammy-py3.8-gcc11 / test (distributed, 1, 2, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_C.cpython-38-x86_64-linux-gnu.so)
pull / linux-jammy-py3.8-gcc11 / test (distributed, 2, 2, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_C.cpython-38-x86_64-linux-gnu.so)
pull / linux-jammy-py3.8-gcc11 / test (docs_test, 1, 1, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_C.cpython-38-x86_64-linux-gnu.so)
pull / linux-jammy-py3.8-gcc11 / test (jit_legacy, 1, 1, linux.2xlarge) (gh)
ImportError: cannot import name '_cuda_endAllocateCurrentStreamToPool' from 'torch._C' (/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_C.cpython-38-x86_64-linux-gnu.so)

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

pull / linux-focal-cuda12.4-py3.10-gcc9-sm86 / test (default, 1, 5, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) ()
Process completed with exit code 1.
pull / linux-focal-cuda12.4-py3.10-gcc9-sm86 / test (default, 2, 5, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) ()
pull / linux-focal-cuda12.4-py3.10-gcc9-sm86 / test (default, 3, 5, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) ()
Process completed with exit code 1.
pull / linux-focal-cuda12.4-py3.10-gcc9-sm86 / test (default, 4, 5, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) ()
pull / linux-focal-cuda12.4-py3.10-gcc9-sm86 / test (default, 5, 5, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) ()
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@zdevito

…cator usage (#130472) We should be able to create multiple CUDAPluggableAllocators in the same pytorch program (see #124807, #125722 for context). When mixing CUDAPluggableAllocators in the same pytorch program, we need to make sure that the deleter passed in through the CUDAPluggableAllocator gets "attached" to the data_ptr and persist until program exit (when it's called to free the memory). Currently, CUDAPluggableAllocator maintains a global `current_custom_allocator`. When creating the `DataPtr`, `raw_deleter` attaches `custom_raw_deleter` to the DataPtr which calls `current_custom_allocator->raw_delete(...)`. This approach is fine when using only one allocator, however for multiple allocator use case, DataPtr would be using the deleter of whatever is in the `current_custom_allocator`. For example, if allocation 1 was done with `cudaMalloc` and allocation 2 was done with `ncclMemAlloc`, and if `current_custom_allocator` is currently pointing to the CUDAPluggableAllocator with `ncclMemAlloc` - when cleaning up the allocation 1, we'd be using `ncclMemFree` instead of `cudaFree`. In this PR, we solve the above problem by remembering the `free_fn_` using a deleter context. Hence, there is no need to go through an allocator object to find the deleter. CC: @zdevito @ptrblck @eqy Pull Request resolved: #130472 Approved by: https://github.com/eqy, https://github.com/ezyang

@zdevito

…cator usage (pytorch#130472) We should be able to create multiple CUDAPluggableAllocators in the same pytorch program (see pytorch#124807, pytorch#125722 for context). When mixing CUDAPluggableAllocators in the same pytorch program, we need to make sure that the deleter passed in through the CUDAPluggableAllocator gets "attached" to the data_ptr and persist until program exit (when it's called to free the memory). Currently, CUDAPluggableAllocator maintains a global `current_custom_allocator`. When creating the `DataPtr`, `raw_deleter` attaches `custom_raw_deleter` to the DataPtr which calls `current_custom_allocator->raw_delete(...)`. This approach is fine when using only one allocator, however for multiple allocator use case, DataPtr would be using the deleter of whatever is in the `current_custom_allocator`. For example, if allocation 1 was done with `cudaMalloc` and allocation 2 was done with `ncclMemAlloc`, and if `current_custom_allocator` is currently pointing to the CUDAPluggableAllocator with `ncclMemAlloc` - when cleaning up the allocation 1, we'd be using `ncclMemFree` instead of `cudaFree`. In this PR, we solve the above problem by remembering the `free_fn_` using a deleter context. Hence, there is no need to go through an allocator object to find the deleter. CC: @zdevito @ptrblck @eqy Pull Request resolved: pytorch#130472 Approved by: https://github.com/eqy, https://github.com/ezyang

@zdevito

…cator usage (pytorch#130472) We should be able to create multiple CUDAPluggableAllocators in the same pytorch program (see pytorch#124807, pytorch#125722 for context). When mixing CUDAPluggableAllocators in the same pytorch program, we need to make sure that the deleter passed in through the CUDAPluggableAllocator gets "attached" to the data_ptr and persist until program exit (when it's called to free the memory). Currently, CUDAPluggableAllocator maintains a global `current_custom_allocator`. When creating the `DataPtr`, `raw_deleter` attaches `custom_raw_deleter` to the DataPtr which calls `current_custom_allocator->raw_delete(...)`. This approach is fine when using only one allocator, however for multiple allocator use case, DataPtr would be using the deleter of whatever is in the `current_custom_allocator`. For example, if allocation 1 was done with `cudaMalloc` and allocation 2 was done with `ncclMemAlloc`, and if `current_custom_allocator` is currently pointing to the CUDAPluggableAllocator with `ncclMemAlloc` - when cleaning up the allocation 1, we'd be using `ncclMemFree` instead of `cudaFree`. In this PR, we solve the above problem by remembering the `free_fn_` using a deleter context. Hence, there is no need to go through an allocator object to find the deleter. CC: @zdevito @ptrblck @eqy Pull Request resolved: pytorch#130472 Approved by: https://github.com/eqy, https://github.com/ezyang

In this PR: - Pool id creation logic is refactored and moved to a MemPool class. `graph_pool_handle()` API now uses `torch.cuda.MemPool()` to get a unique id for a pool. Existing tests should cover this change. - MemPool holds a pointer to a CUDAAllocator as proposed in #124807 (comment). Tests are added to show usage with CUDAPluggableAllocator. - MemPoolContext API makes a mempool active. Tests are added to show usage of this API. This API will be used in CUDACachingAllocator to route allocations to a user provided allocator. See draft here: #125722 Pull Request resolved: #131152 Approved by: https://github.com/eqy, https://github.com/ezyang

github-actions · 2024-08-03T21:33:54Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

syed-ahmed · 2024-08-14T09:57:58Z

Stacked PRs with tests have been posted. Closing this.

pytorch-bot Bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category labels May 7, 2024

syed-ahmed changed the title ~~Uses memory pools for mixing CUDA Allocators~~ Uses memory pools for mixing CUDA allocators May 7, 2024

pytorchbot added the open source label May 8, 2024

syed-ahmed force-pushed the torch-mempool-upstream branch 2 times, most recently from b3bf94b to b0dc669 Compare May 15, 2024 03:11

syed-ahmed added 12 commits May 29, 2024 13:07

Implements torch.cuda.MemPool() API

11075d8

Resolves merge conflicts

4ccfc63

Fixes merge conflict

89659b2

Uses customizable allocator in MemPool

d4f9179

Resolves linter errors

d9e23cf

Add deregistration code

95d443b

Fixes bug

274676e

Tries fixing pybind11 and incomplete type error

fcd5506

Resoles linter errors

ae6e19b

Fixes loc missing due to merge

8af15f6

Uses more of existing API

c9bb5aa

Fixes linter errors

29d15bd

syed-ahmed force-pushed the torch-mempool-upstream branch from 4688287 to 29d15bd Compare May 29, 2024 17:16

Aidyn-A marked this pull request as ready for review June 4, 2024 17:52

Aidyn-A requested a review from eqy as a code owner June 4, 2024 17:52

Aidyn-A marked this pull request as draft June 4, 2024 17:52

syed-ahmed mentioned this pull request Jul 10, 2024

Uses context pointer for deleter to enable multiple CUDAPluggableAllocator usage #130472

Closed

This was referenced Jul 19, 2024

Implements torch.cuda.MemPool() API #131152

Closed

[RFC] Mix and Match CUDA Allocators using Private Pools #124807

Closed

github-actions Bot added the Stale label Aug 3, 2024

syed-ahmed closed this Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uses memory pools for mixing CUDA allocators#125722

Uses memory pools for mixing CUDA allocators#125722
syed-ahmed wants to merge 12 commits intopytorch:mainfrom
syed-ahmed:torch-mempool-upstream

syed-ahmed commented May 7, 2024 •

edited by pytorch-bot Bot

Loading

Uh oh!

pytorch-bot Bot commented May 7, 2024 •

edited

Loading

Uh oh!

github-actions Bot commented Aug 3, 2024

Uh oh!

syed-ahmed commented Aug 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

syed-ahmed commented May 7, 2024 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented May 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125722

❌ 66 New Failures, 5 Unrelated Failures

Uh oh!

github-actions Bot commented Aug 3, 2024

Uh oh!

syed-ahmed commented Aug 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

syed-ahmed commented May 7, 2024 •

edited by pytorch-bot Bot

Loading

pytorch-bot Bot commented May 7, 2024 •

edited

Loading