Skip to content

[benchmark] Skip pytorch_CycleGAN_and_pix2pix from inductor benchmarks#175066

Closed
seemethere wants to merge 3 commits intogh/seemethere/127/basefrom
gh/seemethere/127/head
Closed

[benchmark] Skip pytorch_CycleGAN_and_pix2pix from inductor benchmarks#175066
seemethere wants to merge 3 commits intogh/seemethere/127/basefrom
gh/seemethere/127/head

Conversation

@seemethere
Copy link
Member

@seemethere seemethere commented Feb 16, 2026

Stack from ghstack (oldest at bottom):

Summary

Skip the pytorch_CycleGAN_and_pix2pix benchmark model from the inductor benchmark suite.

This legacy 2017 model has been failing with eager_fail_to_run on 100%
of commits since mid-2025, providing zero CI signal while consuming
~5.3M GPU-seconds/week across 7+ benchmark jobs on CUDA, CPU, and ROCm.

Estimated savings: ~310 GPU-hours/week (~1,240 GPU-hours/month)

Skip it in torchbench.yaml and remove its entries from all 31 expected
accuracy CSV files. Also remove it from the higher_fp16 tolerance list.

See P2188981399 for the full CI workflow analysis.

Test Plan

  • CI should pass with CycleGAN skipped (it was already failing 100% of the time)
  • No other benchmark models affected

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @chauhang @amjames @Lucaskabela @jataylo

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 16, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/175066

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 55 Pending

As of commit 474836d with merge base 996c7d8 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

seemethere added a commit that referenced this pull request Feb 16, 2026
This legacy 2017 model has been failing with eager_fail_to_run on 100%
of commits since mid-2025, providing zero CI signal while consuming
~5.3M GPU-seconds/week across 7+ benchmark jobs on CUDA, CPU, and ROCm.

Skip it in torchbench.yaml and remove its entries from all 31 expected
accuracy CSV files.


ghstack-source-id: 14abaec
Pull-Request: #175066
@seemethere
Copy link
Member Author

seemethere commented Feb 16, 2026

Context

This benchmark (pytorch_CycleGAN_and_pix2pix) has been broken for 8+ months, failing on 100% of commits since at least mid-2025. The failure mode degraded over time:

  • June 2025: Graph breaks mismatch (IMPROVED: graph_breaks=0, expected=6)
  • Sept 2025: accuracy=fail_accuracy, expected=pass
  • Nov 2025 - present: accuracy=eager_fail_to_run, expected=pass (can't even run in eager mode)

Impact

The broken benchmark was failing identically across 7+ CI jobs spanning CUDA, CPU, and ROCm (inductor, inductor-periodic workflows), wasting an estimated ~5.3M GPU-seconds/week (~310 GPU-hours/week) while providing zero regression signal.

This was identified as the #1 immediate savings opportunity in a broader CI workflow analysis (P2188981399) covering 128 PR+trunk jobs over 30 days.

What this PR does

Adds pytorch_CycleGAN_and_pix2pix to the skip.all list in torchbench.yaml and removes its entries from all 31 expected accuracy CSVs. CycleGAN is a legacy 2017 GAN architecture that is no longer representative of modern workloads.

[ghstack-poisoned]
Copy link
Contributor

@huydhn huydhn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! cc @BoyuanFeng fyi

@seemethere
Copy link
Member Author

@pytorchbot merge -i

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 17, 2026
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: inductor / unit-test / inductor-test / test (inductor, 2, 2, linux.g5.4xlarge.nvidia.gpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x d8f05140ccf6ce38eb307f6ecfcfed21d9c9d7b5 returned non-zero exit code 1

Auto-merging benchmarks/dynamo/ci_expected_accuracy/cpu_inductor_torchbench_inference.csv
CONFLICT (content): Merge conflict in benchmarks/dynamo/ci_expected_accuracy/cpu_inductor_torchbench_inference.csv
error: could not apply d8f05140ccf... [benchmark] Skip pytorch_CycleGAN_and_pix2pix from inductor benchmarks
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Details for Dev Infra team Raised by workflow job

@pytorchmergebot
Copy link
Collaborator

Starting merge as part of PR stack under #175067

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Starting merge as part of PR stack under #175067

pytorchmergebot pushed a commit that referenced this pull request Feb 17, 2026
)

## Summary

Move CUDA 12.8 GPU tests from per-commit trunk CI to periodic (~3x/day on weekdays).

Both CUDA 12.8 and 13.0 are shipping wheel targets (nightly ships cu126, cu128, cu129, cu130), but their trunk CI test suites have **85-90% failure correlation** -- they almost always fail together. Over a 30-day analysis window covering 97 reverts and 38 significant regression events, **CUDA 12.8 never uniquely caught a regression that 13.0 missed**.

CUDA 13.0 is kept per-commit because:
- It is the **newest** shipping CUDA version
- Most likely to surface **novel breakage** from new CUDA runtime behavior
- Forward-looking CI should protect what's coming, not what's already stable

CUDA 12.8 is moved to periodic because:
- It is **mature and well-understood** -- breakage is less likely and less urgent
- The rare 12.8-only regression can tolerate the ~8-hour periodic detection window
- The 12.8 build job **remains in trunk** because `cross-compile-linux-test` depends on its artifacts

**Estimated savings: ~1,270 GPU-hours/week (~5,080 GPU-hours/month)**

This is the #2 savings opportunity from a broader CI workflow analysis (P2188981399) covering 128 PR+trunk jobs over 30 days. Combined with #175066 (CycleGAN skip, ~310 GPU-hours/week), total savings from this stack: **~1,580 GPU-hours/week (~6,320 GPU-hours/month)**.

### Changes
- `trunk.yml`: remove CUDA 12.8 test job (5 default + 3 distributed + 1 pr_time_benchmarks + 1 libtorch shards) and no-ops build
- `periodic.yml`: add default (5 GPU shards on g6.4xlarge) and distributed (3 multi-GPU shards on g4dn.12xlarge) to existing CUDA 12.8 periodic entry

## Test Plan

- CUDA 12.8 GPU tests continue to run in periodic (3x/day weekdays)
- CUDA 13.0 per-commit coverage is unchanged
- Cross-compile-linux-test continues to work (12.8 build job kept)

Pull Request resolved: #175067
Approved by: https://github.com/malfet
ghstack dependencies: #175066
meta-codesync bot pushed a commit to pytorch/benchmark that referenced this pull request Feb 18, 2026
Summary:
## Summary

Skip the `pytorch_CycleGAN_and_pix2pix` benchmark model from the inductor benchmark suite.

This legacy 2017 model has been failing with `eager_fail_to_run` on 100%
of commits since mid-2025, providing zero CI signal while consuming
~5.3M GPU-seconds/week across 7+ benchmark jobs on CUDA, CPU, and ROCm.

**Estimated savings: ~310 GPU-hours/week (~1,240 GPU-hours/month)**

Skip it in `torchbench.yaml` and remove its entries from all 31 expected
accuracy CSV files. Also remove it from the `higher_fp16` tolerance list.

See P2188981399 for the full CI workflow analysis.

## Test Plan

- CI should pass with CycleGAN skipped (it was already failing 100% of the time)
- No other benchmark models affected

X-link: pytorch/pytorch#175066
Approved by: https://github.com/huydhn, https://github.com/malfet

Reviewed By: atalman

Differential Revision: D93507913

fbshipit-source-id: b9e0b750b38bd2a9afd72eddede007a1eedf0c09
@atalman
Copy link
Contributor

atalman commented Feb 19, 2026

@pytorchbot cherry-pick --onto release/2.11 -c critical

pytorchbot pushed a commit that referenced this pull request Feb 19, 2026
#175066)

## Summary

Skip the `pytorch_CycleGAN_and_pix2pix` benchmark model from the inductor benchmark suite.

This legacy 2017 model has been failing with `eager_fail_to_run` on 100%
of commits since mid-2025, providing zero CI signal while consuming
~5.3M GPU-seconds/week across 7+ benchmark jobs on CUDA, CPU, and ROCm.

**Estimated savings: ~310 GPU-hours/week (~1,240 GPU-hours/month)**

Skip it in `torchbench.yaml` and remove its entries from all 31 expected
accuracy CSV files. Also remove it from the `higher_fp16` tolerance list.

See P2188981399 for the full CI workflow analysis.

## Test Plan

- CI should pass with CycleGAN skipped (it was already failing 100% of the time)
- No other benchmark models affected

Pull Request resolved: #175066
Approved by: https://github.com/huydhn, https://github.com/malfet

(cherry picked from commit 688c943)
@pytorchbot
Copy link
Collaborator

Cherry picking #175066

The cherry pick PR is at #175299 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated:

Details for Dev Infra team Raised by workflow job

pytorchbot pushed a commit that referenced this pull request Feb 19, 2026
)

## Summary

Move CUDA 12.8 GPU tests from per-commit trunk CI to periodic (~3x/day on weekdays).

Both CUDA 12.8 and 13.0 are shipping wheel targets (nightly ships cu126, cu128, cu129, cu130), but their trunk CI test suites have **85-90% failure correlation** -- they almost always fail together. Over a 30-day analysis window covering 97 reverts and 38 significant regression events, **CUDA 12.8 never uniquely caught a regression that 13.0 missed**.

CUDA 13.0 is kept per-commit because:
- It is the **newest** shipping CUDA version
- Most likely to surface **novel breakage** from new CUDA runtime behavior
- Forward-looking CI should protect what's coming, not what's already stable

CUDA 12.8 is moved to periodic because:
- It is **mature and well-understood** -- breakage is less likely and less urgent
- The rare 12.8-only regression can tolerate the ~8-hour periodic detection window
- The 12.8 build job **remains in trunk** because `cross-compile-linux-test` depends on its artifacts

**Estimated savings: ~1,270 GPU-hours/week (~5,080 GPU-hours/month)**

This is the #2 savings opportunity from a broader CI workflow analysis (P2188981399) covering 128 PR+trunk jobs over 30 days. Combined with #175066 (CycleGAN skip, ~310 GPU-hours/week), total savings from this stack: **~1,580 GPU-hours/week (~6,320 GPU-hours/month)**.

### Changes
- `trunk.yml`: remove CUDA 12.8 test job (5 default + 3 distributed + 1 pr_time_benchmarks + 1 libtorch shards) and no-ops build
- `periodic.yml`: add default (5 GPU shards on g6.4xlarge) and distributed (3 multi-GPU shards on g4dn.12xlarge) to existing CUDA 12.8 periodic entry

## Test Plan

- CUDA 12.8 GPU tests continue to run in periodic (3x/day weekdays)
- CUDA 13.0 per-commit coverage is unchanged
- Cross-compile-linux-test continues to work (12.8 build job kept)

Pull Request resolved: #175067
Approved by: https://github.com/malfet
ghstack dependencies: #175066

(cherry picked from commit ef0353f)
atalman pushed a commit that referenced this pull request Feb 19, 2026
#175299)

[benchmark] Skip pytorch_CycleGAN_and_pix2pix from inductor benchmarks (#175066)

## Summary

Skip the `pytorch_CycleGAN_and_pix2pix` benchmark model from the inductor benchmark suite.

This legacy 2017 model has been failing with `eager_fail_to_run` on 100%
of commits since mid-2025, providing zero CI signal while consuming
~5.3M GPU-seconds/week across 7+ benchmark jobs on CUDA, CPU, and ROCm.

**Estimated savings: ~310 GPU-hours/week (~1,240 GPU-hours/month)**

Skip it in `torchbench.yaml` and remove its entries from all 31 expected
accuracy CSV files. Also remove it from the `higher_fp16` tolerance list.

See P2188981399 for the full CI workflow analysis.

## Test Plan

- CI should pass with CycleGAN skipped (it was already failing 100% of the time)
- No other benchmark models affected

Pull Request resolved: #175066
Approved by: https://github.com/huydhn, https://github.com/malfet

(cherry picked from commit 688c943)

Co-authored-by: Eli Uriegas <eliuriegas@meta.com>
atalman pushed a commit that referenced this pull request Feb 19, 2026
)

[CI] Move CUDA 12.8 GPU tests from per-commit trunk to periodic (#175067)

## Summary

Move CUDA 12.8 GPU tests from per-commit trunk CI to periodic (~3x/day on weekdays).

Both CUDA 12.8 and 13.0 are shipping wheel targets (nightly ships cu126, cu128, cu129, cu130), but their trunk CI test suites have **85-90% failure correlation** -- they almost always fail together. Over a 30-day analysis window covering 97 reverts and 38 significant regression events, **CUDA 12.8 never uniquely caught a regression that 13.0 missed**.

CUDA 13.0 is kept per-commit because:
- It is the **newest** shipping CUDA version
- Most likely to surface **novel breakage** from new CUDA runtime behavior
- Forward-looking CI should protect what's coming, not what's already stable

CUDA 12.8 is moved to periodic because:
- It is **mature and well-understood** -- breakage is less likely and less urgent
- The rare 12.8-only regression can tolerate the ~8-hour periodic detection window
- The 12.8 build job **remains in trunk** because `cross-compile-linux-test` depends on its artifacts

**Estimated savings: ~1,270 GPU-hours/week (~5,080 GPU-hours/month)**

This is the #2 savings opportunity from a broader CI workflow analysis (P2188981399) covering 128 PR+trunk jobs over 30 days. Combined with #175066 (CycleGAN skip, ~310 GPU-hours/week), total savings from this stack: **~1,580 GPU-hours/week (~6,320 GPU-hours/month)**.

### Changes
- `trunk.yml`: remove CUDA 12.8 test job (5 default + 3 distributed + 1 pr_time_benchmarks + 1 libtorch shards) and no-ops build
- `periodic.yml`: add default (5 GPU shards on g6.4xlarge) and distributed (3 multi-GPU shards on g4dn.12xlarge) to existing CUDA 12.8 periodic entry

## Test Plan

- CUDA 12.8 GPU tests continue to run in periodic (3x/day weekdays)
- CUDA 13.0 per-commit coverage is unchanged
- Cross-compile-linux-test continues to work (12.8 build job kept)

Pull Request resolved: #175067
Approved by: https://github.com/malfet
ghstack dependencies: #175066

(cherry picked from commit ef0353f)

Co-authored-by: Eli Uriegas <eliuriegas@meta.com>
pbielak added a commit to intel/torch-xpu-ops that referenced this pull request Mar 11, 2026
- Re-enable `detectron2_maskrcnn` skip in skip.all.
- Re-enable all `timm_*` model skips in skip.all.
- Keep explicit upstream PR context comments for `modded_nanogpt`
  and `pytorch_CycleGAN_and_pix2pix`.
- Remove stale expected-accuracy rows for skipped models.

Relevant PRs:
[1] pytorch/pytorch#120299
[2] pytorch/pytorch#164816
[3] pytorch/pytorch#172125
[4] pytorch/pytorch#175066
[5] #2306
sandy-gags pushed a commit to sandy-gags/pytorch that referenced this pull request Mar 12, 2026
This legacy 2017 model has been failing with eager_fail_to_run on 100%
of commits since mid-2025, providing zero CI signal while consuming
~5.3M GPU-seconds/week across 7+ benchmark jobs on CUDA, CPU, and ROCm.

Skip it in torchbench.yaml and remove its entries from all 31 expected
accuracy CSV files.

ghstack-source-id: b745ae2
Pull-Request: pytorch/pytorch#175066
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants