[AOTI Eager] Support multi-return ops in AOTIPythonKernelHolder#176019
[AOTI Eager] Support multi-return ops in AOTIPythonKernelHolder#176019
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176019
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 7a14833 with merge base 1ba2b33 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
885a2c1 to
d259a18
Compare
) Summary: Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path. The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate. Test Plan: ``` buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax ``` Reviewed By: arui-meta Differential Revision: D94364952
d259a18 to
aae125b
Compare
) Summary: Pull Request resolved: #176019 Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path. The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate. Test Plan: ``` buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax ``` Reviewed By: arui-meta Differential Revision: D94364952
|
@StellarrZ has exported this pull request. If you are a Meta employee, you can view the originating Diff in D94364952. |
aae125b to
554c9cb
Compare
) Summary: Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path. The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate. Test Plan: ``` buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax ``` Reviewed By: arui-meta Differential Revision: D94364952
554c9cb to
a54b43f
Compare
) Summary: Pull Request resolved: #176019 Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path. The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate. Test Plan: ``` buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax ``` Reviewed By: arui-meta Differential Revision: D94364952
a54b43f to
84c7827
Compare
) Summary: Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path. The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate. Test Plan: ``` buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax ``` Reviewed By: arui-meta Differential Revision: D94364952
84c7827 to
a2c6398
Compare
) Summary: Pull Request resolved: #176019 Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path. The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate. Test Plan: ``` buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax ``` Reviewed By: arui-meta Differential Revision: D94364952
a2c6398 to
0302171
Compare
) Summary: Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path. The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate. Test Plan: ``` buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax ``` Reviewed By: arui-meta Differential Revision: D94364952
0302171 to
bdb138f
Compare
) Summary: Pull Request resolved: #176019 Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path. The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate. Test Plan: ``` buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax ``` Reviewed By: arui-meta Differential Revision: D94364952
) Summary: Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path. The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate. Test Plan: ``` buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax ``` Reviewed By: arui-meta Differential Revision: D94364952
bdb138f to
5e9573c
Compare
5e9573c to
10d52f6
Compare
) Summary: Pull Request resolved: #176019 Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path. The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate. Test Plan: ``` buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax ``` Reviewed By: arui-meta Differential Revision: D94364952
10d52f6 to
ebc2079
Compare
) Summary: Pull Request resolved: #176019 Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path. The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate. Test Plan: ``` buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax ``` Reviewed By: arui-meta Differential Revision: D94364952
Summary: Populate the C++ in-memory cache after first compilation so subsequent same-shape dispatches resolve entirely in C++ without calling back into Python. Root cause: `AOTIPythonKernelHolder::cache_miss()` compiled and loaded the kernel but never added it to `aoti_kernel_cache_`. Every dispatch re-entered `produce_aoti_kernel_lib` → acquired the GIL → called Python `_compile_afg` → parsed JSON → loaded AFG from disk. Fix: after `cache_miss()` creates the kernel runner, populate `aoti_kernel_cache_` with the input metadata and runner so subsequent calls with matching shapes hit `cache_hit()` directly. Performance (`aten.bitwise_not`, shape [32,32], 100k iterations): - Before: 34,260 us/call (disk round-trip on every dispatch) - After: 21.5 us/call (in-memory cache hit, 1,593x faster) Test Plan: ``` buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_inmemory_cache_reuse ``` Differential Revision: D93171553
…76018) Summary: Pull Request resolved: #176018 Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes. Changes: - `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib` - `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set - `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes - `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params Test Plan: ``` buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test ``` Differential Revision: D94301187
68beb0a to
275c568
Compare
) Summary: Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path. The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate. Test Plan: ``` buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax ``` Reviewed By: arui-meta Differential Revision: D94364952
) Summary: Pull Request resolved: #176019 Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path. The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate. Test Plan: ``` buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax ``` Reviewed By: arui-meta Differential Revision: D94364952
275c568 to
7a14833
Compare
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: Command Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: Command Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
## Summary Adds a `workflow_dispatch` workflow that the autorevert system can trigger when it detects an early failure pattern. Claude Opus 4.6 analyzes the suspect commit's diff, failed job logs, and PyTorch source code to determine whether the commit actually caused the CI failures. Returns a structured JSON verdict as an artifact: - **revert** — causal chain found, proceed to revert immediately - **unsure** — inconclusive, continue with restart-to-confirm (default behavior unchanged) - **not_related** — failures unrelated to the change, ignore this signal - **garbage** — signal is unreliable (infra flake, driver crash), suppress for ~2 hours Design doc: https://docs.google.com/document/d/1BA9B7cIIKiapI37fSFGDD7D0F-VwMyRKJW0PoS0KkbY/edit ## Evaluation Results (13/13 correct verdicts) Prototyped and tested on [pytorch/ciforge](https://github.com/pytorch/ciforge). Results across diverse failure types: ### Round 1 (2026-03-12) — 4/4 correct | Test Case | PR | Failure | Expected | Actual | Job | |-----------|-----|---------|----------|--------|-----| | Doc-only change | #177288 | pca_lowrank stride mismatch | not_related | **not_related @ 0.99** | [job](https://github.com/pytorch/ciforge/actions/runs/23016718498) | | Dynamo einops fix | #177165 | detectron2 graph_breaks + test_is_nonzero_mps | not_related | **not_related @ 0.93** | [job](https://github.com/pytorch/ciforge/actions/runs/23016730498) | | MPS cdouble guard | #176985 | test_is_nonzero_mps + pca_lowrank | revert | **revert @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23016740133) | | Lint missing import | #176613 | Lint / lintrunner-noclang-all | revert | **revert @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23013529685) | ### Round 2 (2026-03-13, automated hourly loop) — 9/9 correct (1 cancelled) | Timestamp | PR | Signal Key | Expected | Actual | Job | |-----------|-----|-----------|----------|--------|-----| | 03:12Z | #176613 | Lint / lintrunner-noclang-all | revert | **revert @ 0.98** | [job](https://github.com/pytorch/ciforge/actions/runs/23034497618) | | 03:12Z | #176613 | fsdp/test_fully_shard_comm (test exec) | revert | **revert @ 0.98** | [job](https://github.com/pytorch/ciforge/actions/runs/23034499988) | | 09:11Z | #177273 | test-timeout-270min (infra) | — | *cancelled* | [job](https://github.com/pytorch/ciforge/actions/runs/23043982417) | | 10:12Z | #176019 | AllenaiLongformerBase fail_to_run (periodic) | garbage | **garbage @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23046142800) | | 10:12Z | #176019 | detectron2_fcos IMPROVED (periodic) | not_related | **not_related @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23046144261) | | 11:10Z | #176019 | functorch_dp_cifar10 fail_accuracy (periodic) | not_related | **not_related @ 0.93** | [job](https://github.com/pytorch/ciforge/actions/runs/23048173319) | | 11:10Z | #176019 | basic_gnn_edgecnn IMPROVED (periodic) | not_related | **not_related @ 0.92** | [job](https://github.com/pytorch/ciforge/actions/runs/23048174698) | | 15:09Z | #177096 | S3 PutObject IAM denied - ROCm gfx950 (infra) | garbage | **garbage @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23057146500) | | 16:09Z | #176019 | vit_base_patch16_siglip_256 fail_to_run (periodic) | not_related | **not_related @ 0.97** | [job](https://github.com/pytorch/ciforge/actions/runs/23059634364) | | 16:09Z | #176019 | shufflenet_v2_x1_0 fail_accuracy (periodic) | not_related | **not_related @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23059635765) | ### Summary by verdict type | Verdict | Count | Correct | Avg Confidence | |---------|-------|---------|----------------| | revert | 4 | 4/4 | 0.97 | | garbage | 2 | 2/2 | 0.95 | | not_related | 7 | 7/7 | 0.94 | ## Test plan - [x] Prototyped and tested on pytorch/ciforge with 13 real trunk failure cases - [x] Verified structured JSON output matches schema - [x] Verified verdict artifact uploads correctly - [ ] Trigger via GitHub UI with `workflow_dispatch` on pytorch/pytorch to validate bedrock environment works - [ ] Integrate dispatch call into autorevert lambda (follow-up) Pull Request resolved: #177404 Approved by: https://github.com/wdvr
## Summary Adds a `workflow_dispatch` workflow that the autorevert system can trigger when it detects an early failure pattern. Claude Opus 4.6 analyzes the suspect commit's diff, failed job logs, and PyTorch source code to determine whether the commit actually caused the CI failures. Returns a structured JSON verdict as an artifact: - **revert** — causal chain found, proceed to revert immediately - **unsure** — inconclusive, continue with restart-to-confirm (default behavior unchanged) - **not_related** — failures unrelated to the change, ignore this signal - **garbage** — signal is unreliable (infra flake, driver crash), suppress for ~2 hours Design doc: https://docs.google.com/document/d/1BA9B7cIIKiapI37fSFGDD7D0F-VwMyRKJW0PoS0KkbY/edit ## Evaluation Results (13/13 correct verdicts) Prototyped and tested on [pytorch/ciforge](https://github.com/pytorch/ciforge). Results across diverse failure types: ### Round 1 (2026-03-12) — 4/4 correct | Test Case | PR | Failure | Expected | Actual | Job | |-----------|-----|---------|----------|--------|-----| | Doc-only change | pytorch#177288 | pca_lowrank stride mismatch | not_related | **not_related @ 0.99** | [job](https://github.com/pytorch/ciforge/actions/runs/23016718498) | | Dynamo einops fix | pytorch#177165 | detectron2 graph_breaks + test_is_nonzero_mps | not_related | **not_related @ 0.93** | [job](https://github.com/pytorch/ciforge/actions/runs/23016730498) | | MPS cdouble guard | pytorch#176985 | test_is_nonzero_mps + pca_lowrank | revert | **revert @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23016740133) | | Lint missing import | pytorch#176613 | Lint / lintrunner-noclang-all | revert | **revert @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23013529685) | ### Round 2 (2026-03-13, automated hourly loop) — 9/9 correct (1 cancelled) | Timestamp | PR | Signal Key | Expected | Actual | Job | |-----------|-----|-----------|----------|--------|-----| | 03:12Z | pytorch#176613 | Lint / lintrunner-noclang-all | revert | **revert @ 0.98** | [job](https://github.com/pytorch/ciforge/actions/runs/23034497618) | | 03:12Z | pytorch#176613 | fsdp/test_fully_shard_comm (test exec) | revert | **revert @ 0.98** | [job](https://github.com/pytorch/ciforge/actions/runs/23034499988) | | 09:11Z | pytorch#177273 | test-timeout-270min (infra) | — | *cancelled* | [job](https://github.com/pytorch/ciforge/actions/runs/23043982417) | | 10:12Z | pytorch#176019 | AllenaiLongformerBase fail_to_run (periodic) | garbage | **garbage @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23046142800) | | 10:12Z | pytorch#176019 | detectron2_fcos IMPROVED (periodic) | not_related | **not_related @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23046144261) | | 11:10Z | pytorch#176019 | functorch_dp_cifar10 fail_accuracy (periodic) | not_related | **not_related @ 0.93** | [job](https://github.com/pytorch/ciforge/actions/runs/23048173319) | | 11:10Z | pytorch#176019 | basic_gnn_edgecnn IMPROVED (periodic) | not_related | **not_related @ 0.92** | [job](https://github.com/pytorch/ciforge/actions/runs/23048174698) | | 15:09Z | pytorch#177096 | S3 PutObject IAM denied - ROCm gfx950 (infra) | garbage | **garbage @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23057146500) | | 16:09Z | pytorch#176019 | vit_base_patch16_siglip_256 fail_to_run (periodic) | not_related | **not_related @ 0.97** | [job](https://github.com/pytorch/ciforge/actions/runs/23059634364) | | 16:09Z | pytorch#176019 | shufflenet_v2_x1_0 fail_accuracy (periodic) | not_related | **not_related @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23059635765) | ### Summary by verdict type | Verdict | Count | Correct | Avg Confidence | |---------|-------|---------|----------------| | revert | 4 | 4/4 | 0.97 | | garbage | 2 | 2/2 | 0.95 | | not_related | 7 | 7/7 | 0.94 | ## Test plan - [x] Prototyped and tested on pytorch/ciforge with 13 real trunk failure cases - [x] Verified structured JSON output matches schema - [x] Verified verdict artifact uploads correctly - [ ] Trigger via GitHub UI with `workflow_dispatch` on pytorch/pytorch to validate bedrock environment works - [ ] Integrate dispatch call into autorevert lambda (follow-up) Pull Request resolved: pytorch#177404 Approved by: https://github.com/wdvr
…rch#176019) Summary: Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path. The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate. Test Plan: ``` buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax ``` Reviewed By: arui-meta Differential Revision: D94364952 Pull Request resolved: pytorch#176019 Approved by: https://github.com/sidt-meta, https://github.com/arui-meta
## Summary Adds a `workflow_dispatch` workflow that the autorevert system can trigger when it detects an early failure pattern. Claude Opus 4.6 analyzes the suspect commit's diff, failed job logs, and PyTorch source code to determine whether the commit actually caused the CI failures. Returns a structured JSON verdict as an artifact: - **revert** — causal chain found, proceed to revert immediately - **unsure** — inconclusive, continue with restart-to-confirm (default behavior unchanged) - **not_related** — failures unrelated to the change, ignore this signal - **garbage** — signal is unreliable (infra flake, driver crash), suppress for ~2 hours Design doc: https://docs.google.com/document/d/1BA9B7cIIKiapI37fSFGDD7D0F-VwMyRKJW0PoS0KkbY/edit ## Evaluation Results (13/13 correct verdicts) Prototyped and tested on [pytorch/ciforge](https://github.com/pytorch/ciforge). Results across diverse failure types: ### Round 1 (2026-03-12) — 4/4 correct | Test Case | PR | Failure | Expected | Actual | Job | |-----------|-----|---------|----------|--------|-----| | Doc-only change | pytorch#177288 | pca_lowrank stride mismatch | not_related | **not_related @ 0.99** | [job](https://github.com/pytorch/ciforge/actions/runs/23016718498) | | Dynamo einops fix | pytorch#177165 | detectron2 graph_breaks + test_is_nonzero_mps | not_related | **not_related @ 0.93** | [job](https://github.com/pytorch/ciforge/actions/runs/23016730498) | | MPS cdouble guard | pytorch#176985 | test_is_nonzero_mps + pca_lowrank | revert | **revert @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23016740133) | | Lint missing import | pytorch#176613 | Lint / lintrunner-noclang-all | revert | **revert @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23013529685) | ### Round 2 (2026-03-13, automated hourly loop) — 9/9 correct (1 cancelled) | Timestamp | PR | Signal Key | Expected | Actual | Job | |-----------|-----|-----------|----------|--------|-----| | 03:12Z | pytorch#176613 | Lint / lintrunner-noclang-all | revert | **revert @ 0.98** | [job](https://github.com/pytorch/ciforge/actions/runs/23034497618) | | 03:12Z | pytorch#176613 | fsdp/test_fully_shard_comm (test exec) | revert | **revert @ 0.98** | [job](https://github.com/pytorch/ciforge/actions/runs/23034499988) | | 09:11Z | pytorch#177273 | test-timeout-270min (infra) | — | *cancelled* | [job](https://github.com/pytorch/ciforge/actions/runs/23043982417) | | 10:12Z | pytorch#176019 | AllenaiLongformerBase fail_to_run (periodic) | garbage | **garbage @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23046142800) | | 10:12Z | pytorch#176019 | detectron2_fcos IMPROVED (periodic) | not_related | **not_related @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23046144261) | | 11:10Z | pytorch#176019 | functorch_dp_cifar10 fail_accuracy (periodic) | not_related | **not_related @ 0.93** | [job](https://github.com/pytorch/ciforge/actions/runs/23048173319) | | 11:10Z | pytorch#176019 | basic_gnn_edgecnn IMPROVED (periodic) | not_related | **not_related @ 0.92** | [job](https://github.com/pytorch/ciforge/actions/runs/23048174698) | | 15:09Z | pytorch#177096 | S3 PutObject IAM denied - ROCm gfx950 (infra) | garbage | **garbage @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23057146500) | | 16:09Z | pytorch#176019 | vit_base_patch16_siglip_256 fail_to_run (periodic) | not_related | **not_related @ 0.97** | [job](https://github.com/pytorch/ciforge/actions/runs/23059634364) | | 16:09Z | pytorch#176019 | shufflenet_v2_x1_0 fail_accuracy (periodic) | not_related | **not_related @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23059635765) | ### Summary by verdict type | Verdict | Count | Correct | Avg Confidence | |---------|-------|---------|----------------| | revert | 4 | 4/4 | 0.97 | | garbage | 2 | 2/2 | 0.95 | | not_related | 7 | 7/7 | 0.94 | ## Test plan - [x] Prototyped and tested on pytorch/ciforge with 13 real trunk failure cases - [x] Verified structured JSON output matches schema - [x] Verified verdict artifact uploads correctly - [ ] Trigger via GitHub UI with `workflow_dispatch` on pytorch/pytorch to validate bedrock environment works - [ ] Integrate dispatch call into autorevert lambda (follow-up) Pull Request resolved: pytorch#177404 Approved by: https://github.com/wdvr
Summary:
Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path.
The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate.
Test Plan:
Reviewed By: arui-meta
Differential Revision: D94364952
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo