Skip to content

[AOTI Eager] Support multi-return ops in AOTIPythonKernelHolder#176019

Closed
StellarrZ wants to merge 3 commits intomainfrom
export-D94364952
Closed

[AOTI Eager] Support multi-return ops in AOTIPythonKernelHolder#176019
StellarrZ wants to merge 3 commits intomainfrom
export-D94364952

Conversation

@StellarrZ
Copy link
Copy Markdown
Contributor

@StellarrZ StellarrZ commented Feb 27, 2026

Summary:
Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path.

The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate.

Test Plan:

buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax

Reviewed By: arui-meta

Differential Revision: D94364952

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Feb 27, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176019

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 7a14833 with merge base 1ba2b33 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-codesync meta-codesync bot force-pushed the export-D94364952 branch from 885a2c1 to d259a18 Compare March 2, 2026 17:16
meta-codesync bot pushed a commit that referenced this pull request Mar 2, 2026
)

Summary:

Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path.

The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate.

Test Plan:
```
buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax
```

Reviewed By: arui-meta

Differential Revision: D94364952
StellarrZ added a commit that referenced this pull request Mar 2, 2026
)

Summary:
Pull Request resolved: #176019

Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path.

The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate.

Test Plan:
```
buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax
```

Reviewed By: arui-meta

Differential Revision: D94364952
@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Mar 2, 2026

@StellarrZ has exported this pull request. If you are a Meta employee, you can view the originating Diff in D94364952.

@meta-codesync meta-codesync bot force-pushed the export-D94364952 branch from aae125b to 554c9cb Compare March 4, 2026 15:45
meta-codesync bot pushed a commit that referenced this pull request Mar 4, 2026
)

Summary:

Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path.

The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate.

Test Plan:
```
buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax
```

Reviewed By: arui-meta

Differential Revision: D94364952
StellarrZ added a commit that referenced this pull request Mar 4, 2026
)

Summary:
Pull Request resolved: #176019

Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path.

The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate.

Test Plan:
```
buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax
```

Reviewed By: arui-meta

Differential Revision: D94364952
@meta-codesync meta-codesync bot force-pushed the export-D94364952 branch from a54b43f to 84c7827 Compare March 6, 2026 18:21
meta-codesync bot pushed a commit that referenced this pull request Mar 6, 2026
)

Summary:

Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path.

The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate.

Test Plan:
```
buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax
```

Reviewed By: arui-meta

Differential Revision: D94364952
StellarrZ added a commit that referenced this pull request Mar 6, 2026
)

Summary:
Pull Request resolved: #176019

Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path.

The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate.

Test Plan:
```
buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax
```

Reviewed By: arui-meta

Differential Revision: D94364952
@meta-codesync meta-codesync bot force-pushed the export-D94364952 branch from a2c6398 to 0302171 Compare March 6, 2026 22:09
meta-codesync bot pushed a commit that referenced this pull request Mar 6, 2026
)

Summary:

Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path.

The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate.

Test Plan:
```
buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax
```

Reviewed By: arui-meta

Differential Revision: D94364952
StellarrZ added a commit that referenced this pull request Mar 6, 2026
)

Summary:
Pull Request resolved: #176019

Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path.

The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate.

Test Plan:
```
buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax
```

Reviewed By: arui-meta

Differential Revision: D94364952
meta-codesync bot pushed a commit that referenced this pull request Mar 11, 2026
)

Summary:

Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path.

The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate.

Test Plan:
```
buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax
```

Reviewed By: arui-meta

Differential Revision: D94364952
StellarrZ added a commit that referenced this pull request Mar 11, 2026
)

Summary:
Pull Request resolved: #176019

Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path.

The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate.

Test Plan:
```
buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax
```

Reviewed By: arui-meta

Differential Revision: D94364952
@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 12, 2026
StellarrZ added a commit that referenced this pull request Mar 12, 2026
)

Summary:
Pull Request resolved: #176019

Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path.

The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate.

Test Plan:
```
buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax
```

Reviewed By: arui-meta

Differential Revision: D94364952
Summary:
Populate the C++ in-memory cache after first compilation so subsequent same-shape dispatches resolve entirely in C++ without calling back into Python.

Root cause: `AOTIPythonKernelHolder::cache_miss()` compiled and loaded the kernel but never added it to `aoti_kernel_cache_`. Every dispatch re-entered `produce_aoti_kernel_lib` → acquired the GIL → called Python `_compile_afg` → parsed JSON → loaded AFG from disk.

Fix: after `cache_miss()` creates the kernel runner, populate `aoti_kernel_cache_` with the input metadata and runner so subsequent calls with matching shapes hit `cache_hit()` directly.

Performance (`aten.bitwise_not`, shape [32,32], 100k iterations):
- Before: 34,260 us/call (disk round-trip on every dispatch)
- After: 21.5 us/call (in-memory cache hit, 1,593x faster)

Test Plan:
```
buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_inmemory_cache_reuse
```

Differential Revision: D93171553
…76018)

Summary:
Pull Request resolved: #176018

Add a `dynamic` parameter to `AOTIPythonKernelHolder` so that the C++ dispatch path can request dynamic-shape compilation from the Python compile backend. When `dynamic_=true`, the holder passes `dynamic=True` to `aoti_compile_with_persistent_cache`, and the in-memory cache uses rank/dtype/device matching instead of exact size/stride matching. This allows a single compiled kernel to serve multiple input shapes.

Changes:
- `AOTIPythonKernelHolder`: added `dynamic_` member, forwarded to `produce_aoti_kernel_lib`
- `AOTIKernelMetadata`: added `is_dynamic_` flag, `check()` uses `dynamic_check()` when set
- `TensorMetadata::dynamic_check()`: matches by dtype/device/rank, skips exact sizes
- `ParameterMetadata::dynamic_check()`: delegates to `TensorMetadata::dynamic_check()` for tensor params

Test Plan:
```
buck test fbcode//caffe2/test/cpp/aoti_eager:kernel_meta_info_test
```

Differential Revision: D94301187
meta-codesync bot pushed a commit that referenced this pull request Mar 12, 2026
)

Summary:

Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path.

The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate.

Test Plan:
```
buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax
```

Reviewed By: arui-meta

Differential Revision: D94364952
)

Summary:
Pull Request resolved: #176019

Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path.

The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate.

Test Plan:
```
buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax
```

Reviewed By: arui-meta

Differential Revision: D94364952
@facebook-github-tools
Copy link
Copy Markdown

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: Command git -C /home/runner/work/pytorch/pytorch rebase origin/main returned non-zero exit code 1

Rebasing (1/1)
Auto-merging torch/csrc/inductor/aoti_eager/kernel_holder.cpp
CONFLICT (content): Merge conflict in torch/csrc/inductor/aoti_eager/kernel_holder.cpp
error: could not apply a64ea4229db... [AOTI Eager] Support multi-return ops in AOTIPythonKernelHolder (#176019)
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Could not apply a64ea4229db... # [AOTI Eager] Support multi-return ops in AOTIPythonKernelHolder (#176019)
Details for Dev Infra team Raised by workflow job

@facebook-github-tools
Copy link
Copy Markdown

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: Command git -C /home/runner/work/pytorch/pytorch merge --squash __pull-request-176019__init__ returned non-zero exit code 1

Auto-merging torch/csrc/inductor/aoti_eager/kernel_holder.cpp
CONFLICT (content): Merge conflict in torch/csrc/inductor/aoti_eager/kernel_holder.cpp
Squash commit -- not updating HEAD
Automatic merge failed; fix conflicts and then commit the result.
Details for Dev Infra team Raised by workflow job

@facebook-github-tools
Copy link
Copy Markdown

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Mar 13, 2026
## Summary

Adds a `workflow_dispatch` workflow that the autorevert system can trigger when it detects an early failure pattern. Claude Opus 4.6 analyzes the suspect commit's diff, failed job logs, and PyTorch source code to determine whether the commit actually caused the CI failures.

Returns a structured JSON verdict as an artifact:
- **revert** — causal chain found, proceed to revert immediately
- **unsure** — inconclusive, continue with restart-to-confirm (default behavior unchanged)
- **not_related** — failures unrelated to the change, ignore this signal
- **garbage** — signal is unreliable (infra flake, driver crash), suppress for ~2 hours

Design doc: https://docs.google.com/document/d/1BA9B7cIIKiapI37fSFGDD7D0F-VwMyRKJW0PoS0KkbY/edit

## Evaluation Results (13/13 correct verdicts)

Prototyped and tested on [pytorch/ciforge](https://github.com/pytorch/ciforge). Results across diverse failure types:

### Round 1 (2026-03-12) — 4/4 correct

| Test Case | PR | Failure | Expected | Actual | Job |
|-----------|-----|---------|----------|--------|-----|
| Doc-only change | #177288 | pca_lowrank stride mismatch | not_related | **not_related @ 0.99** | [job](https://github.com/pytorch/ciforge/actions/runs/23016718498) |
| Dynamo einops fix | #177165 | detectron2 graph_breaks + test_is_nonzero_mps | not_related | **not_related @ 0.93** | [job](https://github.com/pytorch/ciforge/actions/runs/23016730498) |
| MPS cdouble guard | #176985 | test_is_nonzero_mps + pca_lowrank | revert | **revert @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23016740133) |
| Lint missing import | #176613 | Lint / lintrunner-noclang-all | revert | **revert @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23013529685) |

### Round 2 (2026-03-13, automated hourly loop) — 9/9 correct (1 cancelled)

| Timestamp | PR | Signal Key | Expected | Actual | Job |
|-----------|-----|-----------|----------|--------|-----|
| 03:12Z | #176613 | Lint / lintrunner-noclang-all | revert | **revert @ 0.98** | [job](https://github.com/pytorch/ciforge/actions/runs/23034497618) |
| 03:12Z | #176613 | fsdp/test_fully_shard_comm (test exec) | revert | **revert @ 0.98** | [job](https://github.com/pytorch/ciforge/actions/runs/23034499988) |
| 09:11Z | #177273 | test-timeout-270min (infra) | — | *cancelled* | [job](https://github.com/pytorch/ciforge/actions/runs/23043982417) |
| 10:12Z | #176019 | AllenaiLongformerBase fail_to_run (periodic) | garbage | **garbage @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23046142800) |
| 10:12Z | #176019 | detectron2_fcos IMPROVED (periodic) | not_related | **not_related @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23046144261) |
| 11:10Z | #176019 | functorch_dp_cifar10 fail_accuracy (periodic) | not_related | **not_related @ 0.93** | [job](https://github.com/pytorch/ciforge/actions/runs/23048173319) |
| 11:10Z | #176019 | basic_gnn_edgecnn IMPROVED (periodic) | not_related | **not_related @ 0.92** | [job](https://github.com/pytorch/ciforge/actions/runs/23048174698) |
| 15:09Z | #177096 | S3 PutObject IAM denied - ROCm gfx950 (infra) | garbage | **garbage @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23057146500) |
| 16:09Z | #176019 | vit_base_patch16_siglip_256 fail_to_run (periodic) | not_related | **not_related @ 0.97** | [job](https://github.com/pytorch/ciforge/actions/runs/23059634364) |
| 16:09Z | #176019 | shufflenet_v2_x1_0 fail_accuracy (periodic) | not_related | **not_related @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23059635765) |

### Summary by verdict type

| Verdict | Count | Correct | Avg Confidence |
|---------|-------|---------|----------------|
| revert | 4 | 4/4 | 0.97 |
| garbage | 2 | 2/2 | 0.95 |
| not_related | 7 | 7/7 | 0.94 |

## Test plan

- [x] Prototyped and tested on pytorch/ciforge with 13 real trunk failure cases
- [x] Verified structured JSON output matches schema
- [x] Verified verdict artifact uploads correctly
- [ ] Trigger via GitHub UI with `workflow_dispatch` on pytorch/pytorch to validate bedrock environment works
- [ ] Integrate dispatch call into autorevert lambda (follow-up)
Pull Request resolved: #177404
Approved by: https://github.com/wdvr
AaronWang04 pushed a commit to AaronWang04/pytorch that referenced this pull request Mar 24, 2026
## Summary

Adds a `workflow_dispatch` workflow that the autorevert system can trigger when it detects an early failure pattern. Claude Opus 4.6 analyzes the suspect commit's diff, failed job logs, and PyTorch source code to determine whether the commit actually caused the CI failures.

Returns a structured JSON verdict as an artifact:
- **revert** — causal chain found, proceed to revert immediately
- **unsure** — inconclusive, continue with restart-to-confirm (default behavior unchanged)
- **not_related** — failures unrelated to the change, ignore this signal
- **garbage** — signal is unreliable (infra flake, driver crash), suppress for ~2 hours

Design doc: https://docs.google.com/document/d/1BA9B7cIIKiapI37fSFGDD7D0F-VwMyRKJW0PoS0KkbY/edit

## Evaluation Results (13/13 correct verdicts)

Prototyped and tested on [pytorch/ciforge](https://github.com/pytorch/ciforge). Results across diverse failure types:

### Round 1 (2026-03-12) — 4/4 correct

| Test Case | PR | Failure | Expected | Actual | Job |
|-----------|-----|---------|----------|--------|-----|
| Doc-only change | pytorch#177288 | pca_lowrank stride mismatch | not_related | **not_related @ 0.99** | [job](https://github.com/pytorch/ciforge/actions/runs/23016718498) |
| Dynamo einops fix | pytorch#177165 | detectron2 graph_breaks + test_is_nonzero_mps | not_related | **not_related @ 0.93** | [job](https://github.com/pytorch/ciforge/actions/runs/23016730498) |
| MPS cdouble guard | pytorch#176985 | test_is_nonzero_mps + pca_lowrank | revert | **revert @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23016740133) |
| Lint missing import | pytorch#176613 | Lint / lintrunner-noclang-all | revert | **revert @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23013529685) |

### Round 2 (2026-03-13, automated hourly loop) — 9/9 correct (1 cancelled)

| Timestamp | PR | Signal Key | Expected | Actual | Job |
|-----------|-----|-----------|----------|--------|-----|
| 03:12Z | pytorch#176613 | Lint / lintrunner-noclang-all | revert | **revert @ 0.98** | [job](https://github.com/pytorch/ciforge/actions/runs/23034497618) |
| 03:12Z | pytorch#176613 | fsdp/test_fully_shard_comm (test exec) | revert | **revert @ 0.98** | [job](https://github.com/pytorch/ciforge/actions/runs/23034499988) |
| 09:11Z | pytorch#177273 | test-timeout-270min (infra) | — | *cancelled* | [job](https://github.com/pytorch/ciforge/actions/runs/23043982417) |
| 10:12Z | pytorch#176019 | AllenaiLongformerBase fail_to_run (periodic) | garbage | **garbage @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23046142800) |
| 10:12Z | pytorch#176019 | detectron2_fcos IMPROVED (periodic) | not_related | **not_related @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23046144261) |
| 11:10Z | pytorch#176019 | functorch_dp_cifar10 fail_accuracy (periodic) | not_related | **not_related @ 0.93** | [job](https://github.com/pytorch/ciforge/actions/runs/23048173319) |
| 11:10Z | pytorch#176019 | basic_gnn_edgecnn IMPROVED (periodic) | not_related | **not_related @ 0.92** | [job](https://github.com/pytorch/ciforge/actions/runs/23048174698) |
| 15:09Z | pytorch#177096 | S3 PutObject IAM denied - ROCm gfx950 (infra) | garbage | **garbage @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23057146500) |
| 16:09Z | pytorch#176019 | vit_base_patch16_siglip_256 fail_to_run (periodic) | not_related | **not_related @ 0.97** | [job](https://github.com/pytorch/ciforge/actions/runs/23059634364) |
| 16:09Z | pytorch#176019 | shufflenet_v2_x1_0 fail_accuracy (periodic) | not_related | **not_related @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23059635765) |

### Summary by verdict type

| Verdict | Count | Correct | Avg Confidence |
|---------|-------|---------|----------------|
| revert | 4 | 4/4 | 0.97 |
| garbage | 2 | 2/2 | 0.95 |
| not_related | 7 | 7/7 | 0.94 |

## Test plan

- [x] Prototyped and tested on pytorch/ciforge with 13 real trunk failure cases
- [x] Verified structured JSON output matches schema
- [x] Verified verdict artifact uploads correctly
- [ ] Trigger via GitHub UI with `workflow_dispatch` on pytorch/pytorch to validate bedrock environment works
- [ ] Integrate dispatch call into autorevert lambda (follow-up)
Pull Request resolved: pytorch#177404
Approved by: https://github.com/wdvr
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
…rch#176019)

Summary:
Relax the cache_lookup assertion from requiring exactly 1 return value to requiring >= 1 return values, all of Tensor type. This enables ops like native_layer_norm (returns 3 Tensors) to dispatch through the AOTI eager path.

The existing cache_hit and cache_miss output handling already loops over all returned tensors and pushes them to the stack individually, so no changes are needed there. The only blocker was the overly restrictive TORCH_CHECK_NOT_IMPLEMENTED gate.

Test Plan:
```
buck run fbcode//mode/opt fbcode//mtia/host_runtime/aoti_eager:test_aoti_eager_dispatch_bin -- TestAotiEagerDispatch.test_multi_return_op_aminmax
```

Reviewed By: arui-meta

Differential Revision: D94364952

Pull Request resolved: pytorch#176019
Approved by: https://github.com/sidt-meta, https://github.com/arui-meta
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
## Summary

Adds a `workflow_dispatch` workflow that the autorevert system can trigger when it detects an early failure pattern. Claude Opus 4.6 analyzes the suspect commit's diff, failed job logs, and PyTorch source code to determine whether the commit actually caused the CI failures.

Returns a structured JSON verdict as an artifact:
- **revert** — causal chain found, proceed to revert immediately
- **unsure** — inconclusive, continue with restart-to-confirm (default behavior unchanged)
- **not_related** — failures unrelated to the change, ignore this signal
- **garbage** — signal is unreliable (infra flake, driver crash), suppress for ~2 hours

Design doc: https://docs.google.com/document/d/1BA9B7cIIKiapI37fSFGDD7D0F-VwMyRKJW0PoS0KkbY/edit

## Evaluation Results (13/13 correct verdicts)

Prototyped and tested on [pytorch/ciforge](https://github.com/pytorch/ciforge). Results across diverse failure types:

### Round 1 (2026-03-12) — 4/4 correct

| Test Case | PR | Failure | Expected | Actual | Job |
|-----------|-----|---------|----------|--------|-----|
| Doc-only change | pytorch#177288 | pca_lowrank stride mismatch | not_related | **not_related @ 0.99** | [job](https://github.com/pytorch/ciforge/actions/runs/23016718498) |
| Dynamo einops fix | pytorch#177165 | detectron2 graph_breaks + test_is_nonzero_mps | not_related | **not_related @ 0.93** | [job](https://github.com/pytorch/ciforge/actions/runs/23016730498) |
| MPS cdouble guard | pytorch#176985 | test_is_nonzero_mps + pca_lowrank | revert | **revert @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23016740133) |
| Lint missing import | pytorch#176613 | Lint / lintrunner-noclang-all | revert | **revert @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23013529685) |

### Round 2 (2026-03-13, automated hourly loop) — 9/9 correct (1 cancelled)

| Timestamp | PR | Signal Key | Expected | Actual | Job |
|-----------|-----|-----------|----------|--------|-----|
| 03:12Z | pytorch#176613 | Lint / lintrunner-noclang-all | revert | **revert @ 0.98** | [job](https://github.com/pytorch/ciforge/actions/runs/23034497618) |
| 03:12Z | pytorch#176613 | fsdp/test_fully_shard_comm (test exec) | revert | **revert @ 0.98** | [job](https://github.com/pytorch/ciforge/actions/runs/23034499988) |
| 09:11Z | pytorch#177273 | test-timeout-270min (infra) | — | *cancelled* | [job](https://github.com/pytorch/ciforge/actions/runs/23043982417) |
| 10:12Z | pytorch#176019 | AllenaiLongformerBase fail_to_run (periodic) | garbage | **garbage @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23046142800) |
| 10:12Z | pytorch#176019 | detectron2_fcos IMPROVED (periodic) | not_related | **not_related @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23046144261) |
| 11:10Z | pytorch#176019 | functorch_dp_cifar10 fail_accuracy (periodic) | not_related | **not_related @ 0.93** | [job](https://github.com/pytorch/ciforge/actions/runs/23048173319) |
| 11:10Z | pytorch#176019 | basic_gnn_edgecnn IMPROVED (periodic) | not_related | **not_related @ 0.92** | [job](https://github.com/pytorch/ciforge/actions/runs/23048174698) |
| 15:09Z | pytorch#177096 | S3 PutObject IAM denied - ROCm gfx950 (infra) | garbage | **garbage @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23057146500) |
| 16:09Z | pytorch#176019 | vit_base_patch16_siglip_256 fail_to_run (periodic) | not_related | **not_related @ 0.97** | [job](https://github.com/pytorch/ciforge/actions/runs/23059634364) |
| 16:09Z | pytorch#176019 | shufflenet_v2_x1_0 fail_accuracy (periodic) | not_related | **not_related @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23059635765) |

### Summary by verdict type

| Verdict | Count | Correct | Avg Confidence |
|---------|-------|---------|----------------|
| revert | 4 | 4/4 | 0.97 |
| garbage | 2 | 2/2 | 0.95 |
| not_related | 7 | 7/7 | 0.94 |

## Test plan

- [x] Prototyped and tested on pytorch/ciforge with 13 real trunk failure cases
- [x] Verified structured JSON output matches schema
- [x] Verified verdict artifact uploads correctly
- [ ] Trigger via GitHub UI with `workflow_dispatch` on pytorch/pytorch to validate bedrock environment works
- [ ] Integrate dispatch call into autorevert lambda (follow-up)
Pull Request resolved: pytorch#177404
Approved by: https://github.com/wdvr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants