Skip to content

[AOTInductor] Memory leak fix for Fallback Kernels#155642

Closed
muchulee8 wants to merge 3 commits intogh/muchulee8/63/basefrom
gh/muchulee8/63/head
Closed

[AOTInductor] Memory leak fix for Fallback Kernels#155642
muchulee8 wants to merge 3 commits intogh/muchulee8/63/basefrom
gh/muchulee8/63/head

Conversation

@muchulee8
Copy link
Contributor

@muchulee8 muchulee8 commented Jun 10, 2025

Stack from ghstack (oldest at bottom):

Summary:
We generate AtenTensorHandles for Fallback kernels regardless of the arg
type. If we indeed "fallback", we will regenerate the AtenTensorHandles
that will cause the first handle being generated not recycled, thus a
memory leak would occur.

Test Plan:
python test/inductor/test_aot_inductor.py -k test_fallback_mem_leak

Reviewers:

Subscribers:

Tasks:

Tags:

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @amjames @chauhang @aakhundov

Summary:
We generate AtenTensorHandles for Fallback kernels regardless of the arg
type. If we indeed "fallback", we will regenerate the AtenTensorHandles
that will cause the first handle being generated not recycled, thus a
memory leak would occur.

Test Plan:
python test/inductor/test_aot_inductor.py -k test_fallback_mem_leak

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Jun 10, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155642

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 9741b3f with merge base a9d5157 (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Contributor

@jingsh jingsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@desertfire desertfire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me to understand why this is a problem for user-defined Triton kernel in particular?

@muchulee8
Copy link
Contributor Author

Can you help me to understand why this is a problem for user-defined Triton kernel in particular?

I just extracted a minimal repro from the internal model and replaced the triton kernel with a external one.
I removed the triton kernel and it also show mem-leak, just updated the test.

Summary:
We generate AtenTensorHandles for Fallback kernels regardless of the arg
type. If we indeed "fallback", we will regenerate the AtenTensorHandles
that will cause the first handle being generated not recycled, thus a
memory leak would occur.

Test Plan:
python test/inductor/test_aot_inductor.py -k test_fallback_mem_leak

Reviewers:

Subscribers:

Tasks:

Tags:

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng amjames chauhang aakhundov

[ghstack-poisoned]
muchulee8 added a commit that referenced this pull request Jun 11, 2025
Summary:
We generate AtenTensorHandles for Fallback kernels regardless of the arg
type. If we indeed "fallback", we will regenerate the AtenTensorHandles
that will cause the first handle being generated not recycled, thus a
memory leak would occur.

Test Plan:
python test/inductor/test_aot_inductor.py -k test_fallback_mem_leak

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: e7fea49
Pull Request resolved: #155642
@muchulee8
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 12, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (dynamic_cpu_inductor_torchbench, 1, 2, linux.8xlarge.amx)

Details for Dev Infra team Raised by workflow job

Summary:
We generate AtenTensorHandles for Fallback kernels regardless of the arg
type. If we indeed "fallback", we will regenerate the AtenTensorHandles
that will cause the first handle being generated not recycled, thus a
memory leak would occur.

Test Plan:
python test/inductor/test_aot_inductor.py -k test_fallback_mem_leak

Reviewers:

Subscribers:

Tasks:

Tags:

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng amjames chauhang aakhundov

[ghstack-poisoned]
muchulee8 added a commit that referenced this pull request Jun 12, 2025
Summary:
We generate AtenTensorHandles for Fallback kernels regardless of the arg
type. If we indeed "fallback", we will regenerate the AtenTensorHandles
that will cause the first handle being generated not recycled, thus a
memory leak would occur.

Test Plan:
python test/inductor/test_aot_inductor.py -k test_fallback_mem_leak

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 63cb06f
Pull Request resolved: #155642
@muchulee8
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Jun 16, 2025
…154371)

Delays code generation for arguments to fallback ops.  This is inspired by #155642, and likely fixes similar memory leaks.

Additionally, prepare for the next PR in the stack by tightening up typing on a `cpp_wrapper` interface that's only used in one (well-typed) place, as well as downstream effects of that change. In particular, this enabled:

1. removing a number of now clearly unnecessary asserts
2. adding a few more targeted asserts to validate the code's current assumptions
3. removing some unneeded control flow in several functions

Pull Request resolved: #154371
Approved by: https://github.com/desertfire
@github-actions github-actions bot deleted the gh/muchulee8/63/head branch July 14, 2025 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants