[AOTI Eager] Support multi-return ops in AOTIPythonKernelHolder by StellarrZ · Pull Request #176019 · pytorch/pytorch

StellarrZ · 2026-02-27T21:23:27Z

Summary:
Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path.

The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate.

Test Plan:

buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax

Reviewed By: arui-meta

Differential Revision: D94364952

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

pytorch-bot · 2026-02-27T21:23:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176019

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 7a14833 with merge base 1ba2b33 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

) Summary: Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path. The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate. Test Plan: ``` buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax ``` Reviewed By: arui-meta Differential Revision: D94364952

) Summary: Pull Request resolved: #176019 Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path. The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate. Test Plan: ``` buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax ``` Reviewed By: arui-meta Differential Revision: D94364952

meta-codesync · 2026-03-02T17:21:38Z

@StellarrZ has exported this pull request. If you are a Meta employee, you can view the originating Diff in D94364952.