Add Claude-powered autorevert AI advisor workflow#177404
Closed
izaitsevfb wants to merge 1 commit intomainfrom
Closed
Add Claude-powered autorevert AI advisor workflow#177404izaitsevfb wants to merge 1 commit intomainfrom
izaitsevfb wants to merge 1 commit intomainfrom
Conversation
Adds a workflow_dispatch workflow that the autorevert system can trigger when it detects an early failure pattern. Claude analyzes the suspect commit diff, failed job logs, and PyTorch source code to determine whether the commit caused the CI failures. Returns a structured JSON verdict (revert/unsure/not_related/garbage) as an artifact that autorevert can consume to make faster, smarter revert decisions.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/177404
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 63 PendingAs of commit 4646b85 with merge base 01316b2 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
wdvr
approved these changes
Mar 13, 2026
Contributor
Author
|
@pytorchbot merge -f 'lint passed' |
Collaborator
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
2 tasks
izaitsevfb
added a commit
to pytorch/test-infra
that referenced
this pull request
Mar 19, 2026
## Summary Adds shadow-mode AI advisor dispatch to the autorevert lambda. When a clean failure partition is detected (2+ failures, 1+ success, no unknown gap), the lambda dispatches the `claude-autorevert-advisor.yml` workflow ([pytorch/pytorch#177404](pytorch/pytorch#177404)) for AI-powered failure analysis. Shadow mode: verdicts are not consumed — dispatch is fire-and-forget for accuracy data collection. ## Changes - `DispatchAdvisor` dataclass emitted from `process_valid_autorevert_pattern()` (pure functional) - `SignalActionProcessor.dispatch_advisors()` — shuffled dispatch with per-signal dedup and cap (8 per workflow+commit) - `AdvisorAction` enum (skip/log/run), CLI `--advisor-action`, env `ADVISOR_ACTION` - Logged to `misc.autorevert_events_v2` as `action='advisor'` - Signal pattern JSON written to `/tmp/advisor-patterns/` for debugging - HUD: "AI" badge on outcome cells + advisor dispatch summary table - State JSON: optional `advisor_dispatches` key (forward/backward compatible) - Default: `AdvisorAction.RUN` (lambda dispatches advisors on deploy, no gha-infra changes needed) ## Test plan - [x] 122 tests passing (19 new covering DispatchAdvisor, execute_advisor, dispatch_advisors, signal pattern JSON, cap/dedup) - [ ] Deploy and monitor shadow-mode dispatches via ClickHouse + HUD
ZainRizvi
added a commit
that referenced
this pull request
Mar 24, 2026
PR #177404 added the claude-autorevert-advisor workflow but missed the `allowed_bots` input on the `anthropics/claude-code-action@v1` step. Without this, the action rejects runs triggered by `pytorch-auto-revert[bot]`, which is the bot that dispatches this workflow. Add `allowed_bots: "pytorch-auto-revert[bot]"` so the action accepts these bot-triggered runs.
AaronWang04
pushed a commit
to AaronWang04/pytorch
that referenced
this pull request
Mar 24, 2026
## Summary Adds a `workflow_dispatch` workflow that the autorevert system can trigger when it detects an early failure pattern. Claude Opus 4.6 analyzes the suspect commit's diff, failed job logs, and PyTorch source code to determine whether the commit actually caused the CI failures. Returns a structured JSON verdict as an artifact: - **revert** — causal chain found, proceed to revert immediately - **unsure** — inconclusive, continue with restart-to-confirm (default behavior unchanged) - **not_related** — failures unrelated to the change, ignore this signal - **garbage** — signal is unreliable (infra flake, driver crash), suppress for ~2 hours Design doc: https://docs.google.com/document/d/1BA9B7cIIKiapI37fSFGDD7D0F-VwMyRKJW0PoS0KkbY/edit ## Evaluation Results (13/13 correct verdicts) Prototyped and tested on [pytorch/ciforge](https://github.com/pytorch/ciforge). Results across diverse failure types: ### Round 1 (2026-03-12) — 4/4 correct | Test Case | PR | Failure | Expected | Actual | Job | |-----------|-----|---------|----------|--------|-----| | Doc-only change | pytorch#177288 | pca_lowrank stride mismatch | not_related | **not_related @ 0.99** | [job](https://github.com/pytorch/ciforge/actions/runs/23016718498) | | Dynamo einops fix | pytorch#177165 | detectron2 graph_breaks + test_is_nonzero_mps | not_related | **not_related @ 0.93** | [job](https://github.com/pytorch/ciforge/actions/runs/23016730498) | | MPS cdouble guard | pytorch#176985 | test_is_nonzero_mps + pca_lowrank | revert | **revert @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23016740133) | | Lint missing import | pytorch#176613 | Lint / lintrunner-noclang-all | revert | **revert @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23013529685) | ### Round 2 (2026-03-13, automated hourly loop) — 9/9 correct (1 cancelled) | Timestamp | PR | Signal Key | Expected | Actual | Job | |-----------|-----|-----------|----------|--------|-----| | 03:12Z | pytorch#176613 | Lint / lintrunner-noclang-all | revert | **revert @ 0.98** | [job](https://github.com/pytorch/ciforge/actions/runs/23034497618) | | 03:12Z | pytorch#176613 | fsdp/test_fully_shard_comm (test exec) | revert | **revert @ 0.98** | [job](https://github.com/pytorch/ciforge/actions/runs/23034499988) | | 09:11Z | pytorch#177273 | test-timeout-270min (infra) | — | *cancelled* | [job](https://github.com/pytorch/ciforge/actions/runs/23043982417) | | 10:12Z | pytorch#176019 | AllenaiLongformerBase fail_to_run (periodic) | garbage | **garbage @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23046142800) | | 10:12Z | pytorch#176019 | detectron2_fcos IMPROVED (periodic) | not_related | **not_related @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23046144261) | | 11:10Z | pytorch#176019 | functorch_dp_cifar10 fail_accuracy (periodic) | not_related | **not_related @ 0.93** | [job](https://github.com/pytorch/ciforge/actions/runs/23048173319) | | 11:10Z | pytorch#176019 | basic_gnn_edgecnn IMPROVED (periodic) | not_related | **not_related @ 0.92** | [job](https://github.com/pytorch/ciforge/actions/runs/23048174698) | | 15:09Z | pytorch#177096 | S3 PutObject IAM denied - ROCm gfx950 (infra) | garbage | **garbage @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23057146500) | | 16:09Z | pytorch#176019 | vit_base_patch16_siglip_256 fail_to_run (periodic) | not_related | **not_related @ 0.97** | [job](https://github.com/pytorch/ciforge/actions/runs/23059634364) | | 16:09Z | pytorch#176019 | shufflenet_v2_x1_0 fail_accuracy (periodic) | not_related | **not_related @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23059635765) | ### Summary by verdict type | Verdict | Count | Correct | Avg Confidence | |---------|-------|---------|----------------| | revert | 4 | 4/4 | 0.97 | | garbage | 2 | 2/2 | 0.95 | | not_related | 7 | 7/7 | 0.94 | ## Test plan - [x] Prototyped and tested on pytorch/ciforge with 13 real trunk failure cases - [x] Verified structured JSON output matches schema - [x] Verified verdict artifact uploads correctly - [ ] Trigger via GitHub UI with `workflow_dispatch` on pytorch/pytorch to validate bedrock environment works - [ ] Integrate dispatch call into autorevert lambda (follow-up) Pull Request resolved: pytorch#177404 Approved by: https://github.com/wdvr
4 tasks
izaitsevfb
added a commit
to pytorch/test-infra
that referenced
this pull request
Mar 24, 2026
## Summary - Adds `'advisor' = 3` to the `action` Enum8 column in `misc.autorevert_events_v2` The autorevert AI advisor lambda ([pytorch/pytorch PR #177404](pytorch/pytorch#177404)) writes `action='advisor'` when logging dispatch events to ClickHouse. However, the table's Enum8 only accepted `none`, `restart`, `revert` — causing ClickHouse to silently store advisor rows as `action='none'`. This broke `prior_advisor_exists()` and `advisor_count_for_commit()` which query `WHERE action = 'advisor'` — they always returned false/0, so the lambda re-dispatched the advisor workflow for the same (commit, signal) every ~5 minutes indefinitely. Already applied live: ```sql ALTER TABLE misc.autorevert_events_v2 MODIFY COLUMN `action` Enum8('none' = 0, 'restart' = 1, 'revert' = 2, 'advisor' = 3) ``` ## Test plan - [x] Verify schema file matches expected Enum values - [x] Apply ALTER TABLE on the live ClickHouse instance - [ ] Confirm subsequent advisor dispatches create rows with `action='advisor'` - [ ] Confirm `prior_advisor_exists()` returns true after first dispatch, preventing duplicates
Copilot AI
pushed a commit
that referenced
this pull request
Mar 27, 2026
PR #177404 added the claude-autorevert-advisor workflow but missed the `allowed_bots` input on the `anthropics/claude-code-action@v1` step. Without this, the action rejects runs triggered by `pytorch-auto-revert[bot]`, which is the bot that dispatches this workflow. Add `allowed_bots: "pytorch-auto-revert[bot]"` so the action accepts these bot-triggered runs. Co-authored-by: Xia-Weiwen <12522207+Xia-Weiwen@users.noreply.github.com>
can-gaa-hou
pushed a commit
to cosdt/test-infra
that referenced
this pull request
Mar 28, 2026
## Summary Adds shadow-mode AI advisor dispatch to the autorevert lambda. When a clean failure partition is detected (2+ failures, 1+ success, no unknown gap), the lambda dispatches the `claude-autorevert-advisor.yml` workflow ([pytorch/pytorch#177404](pytorch/pytorch#177404)) for AI-powered failure analysis. Shadow mode: verdicts are not consumed — dispatch is fire-and-forget for accuracy data collection. ## Changes - `DispatchAdvisor` dataclass emitted from `process_valid_autorevert_pattern()` (pure functional) - `SignalActionProcessor.dispatch_advisors()` — shuffled dispatch with per-signal dedup and cap (8 per workflow+commit) - `AdvisorAction` enum (skip/log/run), CLI `--advisor-action`, env `ADVISOR_ACTION` - Logged to `misc.autorevert_events_v2` as `action='advisor'` - Signal pattern JSON written to `/tmp/advisor-patterns/` for debugging - HUD: "AI" badge on outcome cells + advisor dispatch summary table - State JSON: optional `advisor_dispatches` key (forward/backward compatible) - Default: `AdvisorAction.RUN` (lambda dispatches advisors on deploy, no gha-infra changes needed) ## Test plan - [x] 122 tests passing (19 new covering DispatchAdvisor, execute_advisor, dispatch_advisors, signal pattern JSON, cap/dedup) - [ ] Deploy and monitor shadow-mode dispatches via ClickHouse + HUD
can-gaa-hou
pushed a commit
to cosdt/test-infra
that referenced
this pull request
Mar 28, 2026
## Summary - Adds `'advisor' = 3` to the `action` Enum8 column in `misc.autorevert_events_v2` The autorevert AI advisor lambda ([pytorch/pytorch PR #177404](pytorch/pytorch#177404)) writes `action='advisor'` when logging dispatch events to ClickHouse. However, the table's Enum8 only accepted `none`, `restart`, `revert` — causing ClickHouse to silently store advisor rows as `action='none'`. This broke `prior_advisor_exists()` and `advisor_count_for_commit()` which query `WHERE action = 'advisor'` — they always returned false/0, so the lambda re-dispatched the advisor workflow for the same (commit, signal) every ~5 minutes indefinitely. Already applied live: ```sql ALTER TABLE misc.autorevert_events_v2 MODIFY COLUMN `action` Enum8('none' = 0, 'restart' = 1, 'revert' = 2, 'advisor' = 3) ``` ## Test plan - [x] Verify schema file matches expected Enum values - [x] Apply ALTER TABLE on the live ClickHouse instance - [ ] Confirm subsequent advisor dispatches create rows with `action='advisor'` - [ ] Confirm `prior_advisor_exists()` returns true after first dispatch, preventing duplicates
EmanueleCoradin
pushed a commit
to EmanueleCoradin/pytorch
that referenced
this pull request
Mar 30, 2026
## Summary Adds a `workflow_dispatch` workflow that the autorevert system can trigger when it detects an early failure pattern. Claude Opus 4.6 analyzes the suspect commit's diff, failed job logs, and PyTorch source code to determine whether the commit actually caused the CI failures. Returns a structured JSON verdict as an artifact: - **revert** — causal chain found, proceed to revert immediately - **unsure** — inconclusive, continue with restart-to-confirm (default behavior unchanged) - **not_related** — failures unrelated to the change, ignore this signal - **garbage** — signal is unreliable (infra flake, driver crash), suppress for ~2 hours Design doc: https://docs.google.com/document/d/1BA9B7cIIKiapI37fSFGDD7D0F-VwMyRKJW0PoS0KkbY/edit ## Evaluation Results (13/13 correct verdicts) Prototyped and tested on [pytorch/ciforge](https://github.com/pytorch/ciforge). Results across diverse failure types: ### Round 1 (2026-03-12) — 4/4 correct | Test Case | PR | Failure | Expected | Actual | Job | |-----------|-----|---------|----------|--------|-----| | Doc-only change | pytorch#177288 | pca_lowrank stride mismatch | not_related | **not_related @ 0.99** | [job](https://github.com/pytorch/ciforge/actions/runs/23016718498) | | Dynamo einops fix | pytorch#177165 | detectron2 graph_breaks + test_is_nonzero_mps | not_related | **not_related @ 0.93** | [job](https://github.com/pytorch/ciforge/actions/runs/23016730498) | | MPS cdouble guard | pytorch#176985 | test_is_nonzero_mps + pca_lowrank | revert | **revert @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23016740133) | | Lint missing import | pytorch#176613 | Lint / lintrunner-noclang-all | revert | **revert @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23013529685) | ### Round 2 (2026-03-13, automated hourly loop) — 9/9 correct (1 cancelled) | Timestamp | PR | Signal Key | Expected | Actual | Job | |-----------|-----|-----------|----------|--------|-----| | 03:12Z | pytorch#176613 | Lint / lintrunner-noclang-all | revert | **revert @ 0.98** | [job](https://github.com/pytorch/ciforge/actions/runs/23034497618) | | 03:12Z | pytorch#176613 | fsdp/test_fully_shard_comm (test exec) | revert | **revert @ 0.98** | [job](https://github.com/pytorch/ciforge/actions/runs/23034499988) | | 09:11Z | pytorch#177273 | test-timeout-270min (infra) | — | *cancelled* | [job](https://github.com/pytorch/ciforge/actions/runs/23043982417) | | 10:12Z | pytorch#176019 | AllenaiLongformerBase fail_to_run (periodic) | garbage | **garbage @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23046142800) | | 10:12Z | pytorch#176019 | detectron2_fcos IMPROVED (periodic) | not_related | **not_related @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23046144261) | | 11:10Z | pytorch#176019 | functorch_dp_cifar10 fail_accuracy (periodic) | not_related | **not_related @ 0.93** | [job](https://github.com/pytorch/ciforge/actions/runs/23048173319) | | 11:10Z | pytorch#176019 | basic_gnn_edgecnn IMPROVED (periodic) | not_related | **not_related @ 0.92** | [job](https://github.com/pytorch/ciforge/actions/runs/23048174698) | | 15:09Z | pytorch#177096 | S3 PutObject IAM denied - ROCm gfx950 (infra) | garbage | **garbage @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23057146500) | | 16:09Z | pytorch#176019 | vit_base_patch16_siglip_256 fail_to_run (periodic) | not_related | **not_related @ 0.97** | [job](https://github.com/pytorch/ciforge/actions/runs/23059634364) | | 16:09Z | pytorch#176019 | shufflenet_v2_x1_0 fail_accuracy (periodic) | not_related | **not_related @ 0.95** | [job](https://github.com/pytorch/ciforge/actions/runs/23059635765) | ### Summary by verdict type | Verdict | Count | Correct | Avg Confidence | |---------|-------|---------|----------------| | revert | 4 | 4/4 | 0.97 | | garbage | 2 | 2/2 | 0.95 | | not_related | 7 | 7/7 | 0.94 | ## Test plan - [x] Prototyped and tested on pytorch/ciforge with 13 real trunk failure cases - [x] Verified structured JSON output matches schema - [x] Verified verdict artifact uploads correctly - [ ] Trigger via GitHub UI with `workflow_dispatch` on pytorch/pytorch to validate bedrock environment works - [ ] Integrate dispatch call into autorevert lambda (follow-up) Pull Request resolved: pytorch#177404 Approved by: https://github.com/wdvr
AaronWang04
pushed a commit
to AaronWang04/pytorch
that referenced
this pull request
Mar 31, 2026
PR pytorch#177404 added the claude-autorevert-advisor workflow but missed the `allowed_bots` input on the `anthropics/claude-code-action@v1` step. Without this, the action rejects runs triggered by `pytorch-auto-revert[bot]`, which is the bot that dispatches this workflow. Add `allowed_bots: "pytorch-auto-revert[bot]"` so the action accepts these bot-triggered runs.
xuhancn
pushed a commit
to xuhancn/pytorch
that referenced
this pull request
Apr 2, 2026
PR pytorch#177404 added the claude-autorevert-advisor workflow but missed the `allowed_bots` input on the `anthropics/claude-code-action@v1` step. Without this, the action rejects runs triggered by `pytorch-auto-revert[bot]`, which is the bot that dispatches this workflow. Add `allowed_bots: "pytorch-auto-revert[bot]"` so the action accepts these bot-triggered runs.
nklshy-aws
pushed a commit
to nklshy-aws/pytorch
that referenced
this pull request
Apr 7, 2026
PR pytorch#177404 added the claude-autorevert-advisor workflow but missed the `allowed_bots` input on the `anthropics/claude-code-action@v1` step. Without this, the action rejects runs triggered by `pytorch-auto-revert[bot]`, which is the bot that dispatches this workflow. Add `allowed_bots: "pytorch-auto-revert[bot]"` so the action accepts these bot-triggered runs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
workflow_dispatchworkflow that the autorevert system can trigger when it detects an early failure pattern. Claude Opus 4.6 analyzes the suspect commit's diff, failed job logs, and PyTorch source code to determine whether the commit actually caused the CI failures.Returns a structured JSON verdict as an artifact:
Design doc: https://docs.google.com/document/d/1BA9B7cIIKiapI37fSFGDD7D0F-VwMyRKJW0PoS0KkbY/edit
Evaluation Results (13/13 correct verdicts)
Prototyped and tested on pytorch/ciforge. Results across diverse failure types:
Round 1 (2026-03-12) — 4/4 correct
Round 2 (2026-03-13, automated hourly loop) — 9/9 correct (1 cancelled)
Summary by verdict type
Test plan
workflow_dispatchon pytorch/pytorch to validate bedrock environment works