[AOTInductor] Memory leak fix for Fallback Kernels by muchulee8 · Pull Request #155642 · pytorch/pytorch

muchulee8 · 2025-06-10T23:36:21Z

Stack from ghstack (oldest at bottom):

-> [AOTInductor] Memory leak fix for Fallback Kernels #155642

Summary:
We generate AtenTensorHandles for Fallback kernels regardless of the arg
type. If we indeed "fallback", we will regenerate the AtenTensorHandles
that will cause the first handle being generated not recycled, thus a
memory leak would occur.

Test Plan:
python test/inductor/test_aot_inductor.py -k test_fallback_mem_leak

Reviewers:

Subscribers:

Tasks:

Tags:

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @amjames @chauhang @aakhundov

Summary: We generate AtenTensorHandles for Fallback kernels regardless of the arg type. If we indeed "fallback", we will regenerate the AtenTensorHandles that will cause the first handle being generated not recycled, thus a memory leak would occur. Test Plan: python test/inductor/test_aot_inductor.py -k test_fallback_mem_leak Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

pytorch-bot · 2025-06-10T23:36:24Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155642

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 9741b3f with merge base a9d5157 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu) (gh) (trunk failure)
MISSING REGRESSION TEST

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jingsh

lgtm

test/inductor/test_aot_inductor.py

desertfire

Can you help me to understand why this is a problem for user-defined Triton kernel in particular?

test/inductor/test_aot_inductor.py

muchulee8 · 2025-06-11T20:12:47Z

Can you help me to understand why this is a problem for user-defined Triton kernel in particular?

I just extracted a minimal repro from the internal model and replaced the triton kernel with a external one.
I removed the triton kernel and it also show mem-leak, just updated the test.

Summary: We generate AtenTensorHandles for Fallback kernels regardless of the arg type. If we indeed "fallback", we will regenerate the AtenTensorHandles that will cause the first handle being generated not recycled, thus a memory leak would occur. Test Plan: python test/inductor/test_aot_inductor.py -k test_fallback_mem_leak Reviewers: Subscribers: Tasks: Tags: cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng amjames chauhang aakhundov [ghstack-poisoned]

Summary: We generate AtenTensorHandles for Fallback kernels regardless of the arg type. If we indeed "fallback", we will regenerate the AtenTensorHandles that will cause the first handle being generated not recycled, thus a memory leak would occur. Test Plan: python test/inductor/test_aot_inductor.py -k test_fallback_mem_leak Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e7fea49 Pull Request resolved: #155642

muchulee8 · 2025-06-12T01:59:12Z

@pytorchbot merge

pytorchmergebot · 2025-06-12T02:01:23Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-06-12T02:01:41Z

Merge failed

Reason: 1 jobs have failed, first few of them are: inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (dynamic_cpu_inductor_torchbench, 1, 2, linux.8xlarge.amx)

Details for Dev Infra team

Raised by workflow job

Summary: We generate AtenTensorHandles for Fallback kernels regardless of the arg type. If we indeed "fallback", we will regenerate the AtenTensorHandles that will cause the first handle being generated not recycled, thus a memory leak would occur. Test Plan: python test/inductor/test_aot_inductor.py -k test_fallback_mem_leak Reviewers: Subscribers: Tasks: Tags: cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng amjames chauhang aakhundov [ghstack-poisoned]

Summary: We generate AtenTensorHandles for Fallback kernels regardless of the arg type. If we indeed "fallback", we will regenerate the AtenTensorHandles that will cause the first handle being generated not recycled, thus a memory leak would occur. Test Plan: python test/inductor/test_aot_inductor.py -k test_fallback_mem_leak Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 63cb06f Pull Request resolved: #155642

muchulee8 · 2025-06-12T17:35:28Z

@pytorchbot merge

pytorchmergebot · 2025-06-12T17:37:17Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…154371) Delays code generation for arguments to fallback ops. This is inspired by #155642, and likely fixes similar memory leaks. Additionally, prepare for the next PR in the stack by tightening up typing on a `cpp_wrapper` interface that's only used in one (well-typed) place, as well as downstream effects of that change. In particular, this enabled: 1. removing a number of now clearly unnecessary asserts 2. adding a few more targeted asserts to validate the code's current assumptions 3. removing some unneeded control flow in several functions Pull Request resolved: #154371 Approved by: https://github.com/desertfire

pytorch-bot bot added ciflow/inductor module: inductor labels Jun 10, 2025

muchulee8 requested a review from desertfire June 11, 2025 02:29

muchulee8 added the release notes: inductor (aoti) label Jun 11, 2025

muchulee8 requested a review from angelayi June 11, 2025 02:30

jingsh approved these changes Jun 11, 2025

View reviewed changes

test/inductor/test_aot_inductor.py Outdated Show resolved Hide resolved

desertfire reviewed Jun 11, 2025

View reviewed changes

test/inductor/test_aot_inductor.py Outdated Show resolved Hide resolved

desertfire approved these changes Jun 12, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 12, 2025

pytorchmergebot added the merging label Jun 12, 2025

pytorchmergebot removed the merging label Jun 12, 2025

pytorchmergebot added the merging label Jun 12, 2025

pytorchmergebot added the Merged label Jun 12, 2025

pytorchmergebot closed this in a125744 Jun 12, 2025

pytorchmergebot removed the merging label Jun 12, 2025

benjaminglass1 mentioned this pull request Jun 13, 2025

[Inductor] Delay codegen for fallback arguments and improve typing #154371

Closed

github-actions bot deleted the gh/muchulee8/63/head branch July 14, 2025 02:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AOTInductor] Memory leak fix for Fallback Kernels#155642

[AOTInductor] Memory leak fix for Fallback Kernels#155642
muchulee8 wants to merge 3 commits intogh/muchulee8/63/basefrom
gh/muchulee8/63/head

muchulee8 commented Jun 10, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Jun 10, 2025 •

edited

Loading

Uh oh!

jingsh left a comment

Uh oh!

Uh oh!

desertfire left a comment

Uh oh!

Uh oh!

muchulee8 commented Jun 11, 2025

Uh oh!

muchulee8 commented Jun 12, 2025

Uh oh!

pytorchmergebot commented Jun 12, 2025

Uh oh!

pytorchmergebot commented Jun 12, 2025

Uh oh!

muchulee8 commented Jun 12, 2025

Uh oh!

pytorchmergebot commented Jun 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

muchulee8 commented Jun 10, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155642

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

jingsh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

desertfire left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

muchulee8 commented Jun 11, 2025

Uh oh!

muchulee8 commented Jun 12, 2025

Uh oh!

pytorchmergebot commented Jun 12, 2025

Merge started

Uh oh!

pytorchmergebot commented Jun 12, 2025

Merge failed

Uh oh!

muchulee8 commented Jun 12, 2025

Uh oh!

pytorchmergebot commented Jun 12, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

muchulee8 commented Jun 10, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jun 10, 2025 •

edited

Loading