[benchmark] Skip pytorch_CycleGAN_and_pix2pix from inductor benchmarks by seemethere · Pull Request #175066 · pytorch/pytorch

seemethere · 2026-02-16T03:10:52Z

Stack from ghstack (oldest at bottom):

Summary

Skip the pytorch_CycleGAN_and_pix2pix benchmark model from the inductor benchmark suite.

This legacy 2017 model has been failing with eager_fail_to_run on 100%
of commits since mid-2025, providing zero CI signal while consuming
~5.3M GPU-seconds/week across 7+ benchmark jobs on CUDA, CPU, and ROCm.

Estimated savings: ~310 GPU-hours/week (~1,240 GPU-hours/month)

Skip it in torchbench.yaml and remove its entries from all 31 expected
accuracy CSV files. Also remove it from the higher_fp16 tolerance list.

See P2188981399 for the full CI workflow analysis.

Test Plan

CI should pass with CycleGAN skipped (it was already failing 100% of the time)
No other benchmark models affected

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @chauhang @amjames @Lucaskabela @jataylo

[ghstack-poisoned]

pytorch-bot · 2026-02-16T03:10:55Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/175066

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 55 Pending

As of commit 474836d with merge base 996c7d8 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This legacy 2017 model has been failing with eager_fail_to_run on 100% of commits since mid-2025, providing zero CI signal while consuming ~5.3M GPU-seconds/week across 7+ benchmark jobs on CUDA, CPU, and ROCm. Skip it in torchbench.yaml and remove its entries from all 31 expected accuracy CSV files. ghstack-source-id: 14abaec Pull-Request: #175066

seemethere · 2026-02-16T03:12:26Z

Context

This benchmark (pytorch_CycleGAN_and_pix2pix) has been broken for 8+ months, failing on 100% of commits since at least mid-2025. The failure mode degraded over time:

June 2025: Graph breaks mismatch (IMPROVED: graph_breaks=0, expected=6)
Sept 2025: accuracy=fail_accuracy, expected=pass
Nov 2025 - present: accuracy=eager_fail_to_run, expected=pass (can't even run in eager mode)

Impact

The broken benchmark was failing identically across 7+ CI jobs spanning CUDA, CPU, and ROCm (inductor, inductor-periodic workflows), wasting an estimated ~5.3M GPU-seconds/week (~310 GPU-hours/week) while providing zero regression signal.

This was identified as the #1 immediate savings opportunity in a broader CI workflow analysis (P2188981399) covering 128 PR+trunk jobs over 30 days.

What this PR does

Adds pytorch_CycleGAN_and_pix2pix to the skip.all list in torchbench.yaml and removes its entries from all 31 expected accuracy CSVs. CycleGAN is a legacy 2017 GAN architecture that is no longer representative of modern workloads.

[ghstack-poisoned]

huydhn

LGTM! cc @BoyuanFeng fyi

seemethere · 2026-02-17T01:38:16Z

@pytorchbot merge -i

pytorchmergebot · 2026-02-17T01:40:23Z

Merge started

Your change will be merged while ignoring the following 1 checks: inductor / unit-test / inductor-test / test (inductor, 2, 2, linux.g5.4xlarge.nvidia.gpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2026-02-17T03:26:47Z

Merge failed

Reason: Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x d8f05140ccf6ce38eb307f6ecfcfed21d9c9d7b5 returned non-zero exit code 1

Auto-merging benchmarks/dynamo/ci_expected_accuracy/cpu_inductor_torchbench_inference.csv
CONFLICT (content): Merge conflict in benchmarks/dynamo/ci_expected_accuracy/cpu_inductor_torchbench_inference.csv
error: could not apply d8f05140ccf... [benchmark] Skip pytorch_CycleGAN_and_pix2pix from inductor benchmarks
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
hint: Disable this message with "git config set advice.mergeConflict false"

Details for Dev Infra team

Raised by workflow job

pytorchmergebot · 2026-02-17T16:40:20Z

Starting merge as part of PR stack under #175067

[ghstack-poisoned]

pytorchmergebot · 2026-02-17T17:50:22Z

Starting merge as part of PR stack under #175067

) ## Summary Move CUDA 12.8 GPU tests from per-commit trunk CI to periodic (~3x/day on weekdays). Both CUDA 12.8 and 13.0 are shipping wheel targets (nightly ships cu126, cu128, cu129, cu130), but their trunk CI test suites have **85-90% failure correlation** -- they almost always fail together. Over a 30-day analysis window covering 97 reverts and 38 significant regression events, **CUDA 12.8 never uniquely caught a regression that 13.0 missed**. CUDA 13.0 is kept per-commit because: - It is the **newest** shipping CUDA version - Most likely to surface **novel breakage** from new CUDA runtime behavior - Forward-looking CI should protect what's coming, not what's already stable CUDA 12.8 is moved to periodic because: - It is **mature and well-understood** -- breakage is less likely and less urgent - The rare 12.8-only regression can tolerate the ~8-hour periodic detection window - The 12.8 build job **remains in trunk** because `cross-compile-linux-test` depends on its artifacts **Estimated savings: ~1,270 GPU-hours/week (~5,080 GPU-hours/month)** This is the #2 savings opportunity from a broader CI workflow analysis (P2188981399) covering 128 PR+trunk jobs over 30 days. Combined with #175066 (CycleGAN skip, ~310 GPU-hours/week), total savings from this stack: **~1,580 GPU-hours/week (~6,320 GPU-hours/month)**. ### Changes - `trunk.yml`: remove CUDA 12.8 test job (5 default + 3 distributed + 1 pr_time_benchmarks + 1 libtorch shards) and no-ops build - `periodic.yml`: add default (5 GPU shards on g6.4xlarge) and distributed (3 multi-GPU shards on g4dn.12xlarge) to existing CUDA 12.8 periodic entry ## Test Plan - CUDA 12.8 GPU tests continue to run in periodic (3x/day weekdays) - CUDA 13.0 per-commit coverage is unchanged - Cross-compile-linux-test continues to work (12.8 build job kept) Pull Request resolved: #175067 Approved by: https://github.com/malfet ghstack dependencies: #175066

Summary: ## Summary Skip the `pytorch_CycleGAN_and_pix2pix` benchmark model from the inductor benchmark suite. This legacy 2017 model has been failing with `eager_fail_to_run` on 100% of commits since mid-2025, providing zero CI signal while consuming ~5.3M GPU-seconds/week across 7+ benchmark jobs on CUDA, CPU, and ROCm. **Estimated savings: ~310 GPU-hours/week (~1,240 GPU-hours/month)** Skip it in `torchbench.yaml` and remove its entries from all 31 expected accuracy CSV files. Also remove it from the `higher_fp16` tolerance list. See P2188981399 for the full CI workflow analysis. ## Test Plan - CI should pass with CycleGAN skipped (it was already failing 100% of the time) - No other benchmark models affected X-link: pytorch/pytorch#175066 Approved by: https://github.com/huydhn, https://github.com/malfet Reviewed By: atalman Differential Revision: D93507913 fbshipit-source-id: b9e0b750b38bd2a9afd72eddede007a1eedf0c09

atalman · 2026-02-19T01:16:06Z

@pytorchbot cherry-pick --onto release/2.11 -c critical

#175066) ## Summary Skip the `pytorch_CycleGAN_and_pix2pix` benchmark model from the inductor benchmark suite. This legacy 2017 model has been failing with `eager_fail_to_run` on 100% of commits since mid-2025, providing zero CI signal while consuming ~5.3M GPU-seconds/week across 7+ benchmark jobs on CUDA, CPU, and ROCm. **Estimated savings: ~310 GPU-hours/week (~1,240 GPU-hours/month)** Skip it in `torchbench.yaml` and remove its entries from all 31 expected accuracy CSV files. Also remove it from the `higher_fp16` tolerance list. See P2188981399 for the full CI workflow analysis. ## Test Plan - CI should pass with CycleGAN skipped (it was already failing 100% of the time) - No other benchmark models affected Pull Request resolved: #175066 Approved by: https://github.com/huydhn, https://github.com/malfet (cherry picked from commit 688c943)

pytorchbot · 2026-02-19T01:25:53Z

Cherry picking #175066

The cherry pick PR is at #175299 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated:

[v.2.11.0] Release Tracker #175093 (comment)

Details for Dev Infra team

Raised by workflow job

) ## Summary Move CUDA 12.8 GPU tests from per-commit trunk CI to periodic (~3x/day on weekdays). Both CUDA 12.8 and 13.0 are shipping wheel targets (nightly ships cu126, cu128, cu129, cu130), but their trunk CI test suites have **85-90% failure correlation** -- they almost always fail together. Over a 30-day analysis window covering 97 reverts and 38 significant regression events, **CUDA 12.8 never uniquely caught a regression that 13.0 missed**. CUDA 13.0 is kept per-commit because: - It is the **newest** shipping CUDA version - Most likely to surface **novel breakage** from new CUDA runtime behavior - Forward-looking CI should protect what's coming, not what's already stable CUDA 12.8 is moved to periodic because: - It is **mature and well-understood** -- breakage is less likely and less urgent - The rare 12.8-only regression can tolerate the ~8-hour periodic detection window - The 12.8 build job **remains in trunk** because `cross-compile-linux-test` depends on its artifacts **Estimated savings: ~1,270 GPU-hours/week (~5,080 GPU-hours/month)** This is the #2 savings opportunity from a broader CI workflow analysis (P2188981399) covering 128 PR+trunk jobs over 30 days. Combined with #175066 (CycleGAN skip, ~310 GPU-hours/week), total savings from this stack: **~1,580 GPU-hours/week (~6,320 GPU-hours/month)**. ### Changes - `trunk.yml`: remove CUDA 12.8 test job (5 default + 3 distributed + 1 pr_time_benchmarks + 1 libtorch shards) and no-ops build - `periodic.yml`: add default (5 GPU shards on g6.4xlarge) and distributed (3 multi-GPU shards on g4dn.12xlarge) to existing CUDA 12.8 periodic entry ## Test Plan - CUDA 12.8 GPU tests continue to run in periodic (3x/day weekdays) - CUDA 13.0 per-commit coverage is unchanged - Cross-compile-linux-test continues to work (12.8 build job kept) Pull Request resolved: #175067 Approved by: https://github.com/malfet ghstack dependencies: #175066 (cherry picked from commit ef0353f)

#175299) [benchmark] Skip pytorch_CycleGAN_and_pix2pix from inductor benchmarks (#175066) ## Summary Skip the `pytorch_CycleGAN_and_pix2pix` benchmark model from the inductor benchmark suite. This legacy 2017 model has been failing with `eager_fail_to_run` on 100% of commits since mid-2025, providing zero CI signal while consuming ~5.3M GPU-seconds/week across 7+ benchmark jobs on CUDA, CPU, and ROCm. **Estimated savings: ~310 GPU-hours/week (~1,240 GPU-hours/month)** Skip it in `torchbench.yaml` and remove its entries from all 31 expected accuracy CSV files. Also remove it from the `higher_fp16` tolerance list. See P2188981399 for the full CI workflow analysis. ## Test Plan - CI should pass with CycleGAN skipped (it was already failing 100% of the time) - No other benchmark models affected Pull Request resolved: #175066 Approved by: https://github.com/huydhn, https://github.com/malfet (cherry picked from commit 688c943) Co-authored-by: Eli Uriegas <eliuriegas@meta.com>

) [CI] Move CUDA 12.8 GPU tests from per-commit trunk to periodic (#175067) ## Summary Move CUDA 12.8 GPU tests from per-commit trunk CI to periodic (~3x/day on weekdays). Both CUDA 12.8 and 13.0 are shipping wheel targets (nightly ships cu126, cu128, cu129, cu130), but their trunk CI test suites have **85-90% failure correlation** -- they almost always fail together. Over a 30-day analysis window covering 97 reverts and 38 significant regression events, **CUDA 12.8 never uniquely caught a regression that 13.0 missed**. CUDA 13.0 is kept per-commit because: - It is the **newest** shipping CUDA version - Most likely to surface **novel breakage** from new CUDA runtime behavior - Forward-looking CI should protect what's coming, not what's already stable CUDA 12.8 is moved to periodic because: - It is **mature and well-understood** -- breakage is less likely and less urgent - The rare 12.8-only regression can tolerate the ~8-hour periodic detection window - The 12.8 build job **remains in trunk** because `cross-compile-linux-test` depends on its artifacts **Estimated savings: ~1,270 GPU-hours/week (~5,080 GPU-hours/month)** This is the #2 savings opportunity from a broader CI workflow analysis (P2188981399) covering 128 PR+trunk jobs over 30 days. Combined with #175066 (CycleGAN skip, ~310 GPU-hours/week), total savings from this stack: **~1,580 GPU-hours/week (~6,320 GPU-hours/month)**. ### Changes - `trunk.yml`: remove CUDA 12.8 test job (5 default + 3 distributed + 1 pr_time_benchmarks + 1 libtorch shards) and no-ops build - `periodic.yml`: add default (5 GPU shards on g6.4xlarge) and distributed (3 multi-GPU shards on g4dn.12xlarge) to existing CUDA 12.8 periodic entry ## Test Plan - CUDA 12.8 GPU tests continue to run in periodic (3x/day weekdays) - CUDA 13.0 per-commit coverage is unchanged - Cross-compile-linux-test continues to work (12.8 build job kept) Pull Request resolved: #175067 Approved by: https://github.com/malfet ghstack dependencies: #175066 (cherry picked from commit ef0353f) Co-authored-by: Eli Uriegas <eliuriegas@meta.com>

- Re-enable `detectron2_maskrcnn` skip in skip.all. - Re-enable all `timm_*` model skips in skip.all. - Keep explicit upstream PR context comments for `modded_nanogpt` and `pytorch_CycleGAN_and_pix2pix`. - Remove stale expected-accuracy rows for skipped models. Relevant PRs: [1] pytorch/pytorch#120299 [2] pytorch/pytorch#164816 [3] pytorch/pytorch#172125 [4] pytorch/pytorch#175066 [5] #2306

This legacy 2017 model has been failing with eager_fail_to_run on 100% of commits since mid-2025, providing zero CI signal while consuming ~5.3M GPU-seconds/week across 7+ benchmark jobs on CUDA, CPU, and ROCm. Skip it in torchbench.yaml and remove its entries from all 31 expected accuracy CSV files. ghstack-source-id: b745ae2 Pull-Request: pytorch/pytorch#175066

Update

a8cf657

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: dynamo topic: not user facing topic category labels Feb 16, 2026

seemethere requested review from huydhn and shunting314 February 16, 2026 03:11

seemethere mentioned this pull request Feb 16, 2026

[CI] Move CUDA 12.8 GPU tests from per-commit trunk to periodic #175067

Closed

Update

eed60f0

[ghstack-poisoned]

huydhn approved these changes Feb 16, 2026

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 17, 2026

pytorchmergebot added the merging label Feb 17, 2026

pytorchmergebot removed the merging label Feb 17, 2026

malfet approved these changes Feb 17, 2026

View reviewed changes

seemethere mentioned this pull request Feb 17, 2026

UNSTABLE inductor / inductor-test / test (inductor_torchbench) #174919

Closed

Update

474836d

[ghstack-poisoned]

pytorchmergebot closed this in 688c943 Feb 17, 2026

pytorchmergebot added the Merged label Feb 17, 2026

pytorchbot mentioned this pull request Feb 19, 2026

[benchmark] Skip pytorch_CycleGAN_and_pix2pix from inductor benchmarks #175299

Merged

pytorchbot mentioned this pull request Feb 19, 2026

[v.2.11.0] Release Tracker #175093

Open

pytorchbot mentioned this pull request Feb 19, 2026

[CI] Move CUDA 12.8 GPU tests from per-commit trunk to periodic #175300

Merged

guangyey mentioned this pull request Mar 6, 2026

Use deterministic algorithms for pytorch_CycleGAN_and_pix2pix on XPU #176594

Closed

pbielak mentioned this pull request Mar 11, 2026

Align torchbench skip lists with upstream PyTorch behavior. intel/torch-xpu-ops#3038

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[benchmark] Skip pytorch_CycleGAN_and_pix2pix from inductor benchmarks#175066

[benchmark] Skip pytorch_CycleGAN_and_pix2pix from inductor benchmarks#175066
seemethere wants to merge 3 commits intogh/seemethere/127/basefrom
gh/seemethere/127/head

seemethere commented Feb 16, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 16, 2026 •

edited

Loading

Uh oh!

seemethere commented Feb 16, 2026 •

edited

Loading

Uh oh!

huydhn left a comment

Uh oh!

seemethere commented Feb 17, 2026

Uh oh!

pytorchmergebot commented Feb 17, 2026

Uh oh!

pytorchmergebot commented Feb 17, 2026

Uh oh!

pytorchmergebot commented Feb 17, 2026

Uh oh!

pytorchmergebot commented Feb 17, 2026

Uh oh!

atalman commented Feb 19, 2026

Uh oh!

pytorchbot commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

seemethere commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

pytorch-bot bot commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/175066

⏳ No Failures, 55 Pending

Uh oh!

seemethere commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Impact

What this PR does

Uh oh!

huydhn left a comment

Choose a reason for hiding this comment

Uh oh!

seemethere commented Feb 17, 2026

Uh oh!

pytorchmergebot commented Feb 17, 2026

Merge started

Uh oh!

pytorchmergebot commented Feb 17, 2026

Merge failed

Uh oh!

pytorchmergebot commented Feb 17, 2026

Uh oh!

pytorchmergebot commented Feb 17, 2026

Uh oh!

atalman commented Feb 19, 2026

Uh oh!

pytorchbot commented Feb 19, 2026

Cherry picking #175066

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

seemethere commented Feb 16, 2026 •

edited

Loading

pytorch-bot bot commented Feb 16, 2026 •

edited

Loading

seemethere commented Feb 16, 2026 •

edited

Loading