Skip to content

Generalize torch._C._set_allocator_settings to be generic#156175

Closed
guangyey wants to merge 68 commits intogh/guangyey/159/basefrom
gh/guangyey/159/head
Closed

Generalize torch._C._set_allocator_settings to be generic#156175
guangyey wants to merge 68 commits intogh/guangyey/159/basefrom
gh/guangyey/159/head

Conversation

@guangyey
Copy link
Collaborator

@guangyey guangyey commented Jun 17, 2025

Stack from ghstack (oldest at bottom):

Motivation

This PR moves the implementation of torch.cuda.memory._set_allocator_settings to torch._C._accelerator_setAllocatorSettings.
Since the original API was intended as a temporary/internal utility, I am not exposing the new function as a public API.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @Lucaskabela @albanD

@pytorch-bot
Copy link

pytorch-bot bot commented Jun 17, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156175

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 Cancelled Job, 1 Unrelated Failure

As of commit 663b9ea with merge base bb67660 (image):

CANCELLED JOB - The following job was cancelled. Please retry:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

guangyey added a commit that referenced this pull request Jun 17, 2025
ghstack-source-id: beffb8f
Pull Request resolved: #156175
@guangyey guangyey changed the title Add a new API of allocator setting for accelerator [WIP] Add a new API of allocator setting for accelerator Jun 17, 2025
@guangyey guangyey marked this pull request as draft June 17, 2025 10:59
@guangyey guangyey marked this pull request as draft June 17, 2025 10:59
@guangyey guangyey marked this pull request as draft June 17, 2025 10:59
guangyey added a commit that referenced this pull request Jun 17, 2025
ghstack-source-id: 16cee17
Pull Request resolved: #156175
guangyey added 4 commits June 17, 2025 17:36
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
guangyey added 7 commits July 15, 2025 19:24
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@guangyey
Copy link
Collaborator Author

Try to land this stack of PRs in a way that makes them easy to revert if needed :(
@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

@guangyey your PR has been reverted as part of the stack under #150312.

[ghstack-poisoned]
@guangyey
Copy link
Collaborator Author

guangyey commented Aug 5, 2025

"Try to land these PRs"
@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (default, 1, 2, linux.rocm.gpu.2)

Details for Dev Infra team Raised by workflow job

@guangyey
Copy link
Collaborator Author

guangyey commented Aug 5, 2025

"Try to land these PRs"
@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 2 checks: pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge, unstable), trunk / linux-jammy-rocm-py3.10 / test (default, 1, 2, linux.rocm.gpu.2)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@guangyey
Copy link
Collaborator Author

guangyey commented Aug 5, 2025

"Try to land these PRs"
@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (default, 1, 2, linux.rocm.gpu.2)

Details for Dev Infra team Raised by workflow job

@guangyey
Copy link
Collaborator Author

guangyey commented Aug 5, 2025

"Try to land these PRs"
@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 2 checks: pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge, unstable), trunk / linux-jammy-rocm-py3.10 / test (default, 1, 2, linux.rocm.gpu.2)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@atalman
Copy link
Contributor

atalman commented Aug 27, 2025

@pytorchmergebot revert -c ghfirst -m "need to revert after #161002"

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

Reverting PR 156175 failed

Reason: Command git -C /home/runner/work/pytorch/pytorch revert --no-edit 908c5cc4c0f22d141776bde47c296b5186691855 returned non-zero exit code 1

Auto-merging c10/cuda/CUDAAllocatorConfig.cpp
CONFLICT (content): Merge conflict in c10/cuda/CUDAAllocatorConfig.cpp
Auto-merging test/test_cuda.py
Auto-merging torch/_C/__init__.pyi.in
CONFLICT (content): Merge conflict in torch/_C/__init__.pyi.in
Auto-merging torch/_dynamo/trace_rules.py
Auto-merging torch/csrc/DeviceAccelerator.cpp
CONFLICT (content): Merge conflict in torch/csrc/DeviceAccelerator.cpp
Auto-merging torch/csrc/cuda/Module.cpp
CONFLICT (content): Merge conflict in torch/csrc/cuda/Module.cpp
Auto-merging torch/cuda/memory.py
error: could not revert 908c5cc4c0f... Generalize torch._C._set_allocator_settings to be generic (#156175)
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git revert --continue".
hint: You can instead skip this commit with "git revert --skip".
hint: To abort and get back to the state before "git revert",
hint: run "git revert --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Details for Dev Infra team Raised by workflow job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: accelerator Issues related to the shared accelerator API module: dynamo open source Reverted topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants