[benchmark] Skip pytorch_CycleGAN_and_pix2pix from inductor benchmarks#175066
[benchmark] Skip pytorch_CycleGAN_and_pix2pix from inductor benchmarks#175066seemethere wants to merge 3 commits intogh/seemethere/127/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/175066
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 55 PendingAs of commit 474836d with merge base 996c7d8 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This legacy 2017 model has been failing with eager_fail_to_run on 100% of commits since mid-2025, providing zero CI signal while consuming ~5.3M GPU-seconds/week across 7+ benchmark jobs on CUDA, CPU, and ROCm. Skip it in torchbench.yaml and remove its entries from all 31 expected accuracy CSV files. ghstack-source-id: 14abaec Pull-Request: #175066
ContextThis benchmark (
ImpactThe broken benchmark was failing identically across 7+ CI jobs spanning CUDA, CPU, and ROCm (inductor, inductor-periodic workflows), wasting an estimated ~5.3M GPU-seconds/week (~310 GPU-hours/week) while providing zero regression signal. This was identified as the What this PR doesAdds |
|
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 1 checks: inductor / unit-test / inductor-test / test (inductor, 2, 2, linux.g5.4xlarge.nvidia.gpu) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: Command Details for Dev Infra teamRaised by workflow job |
|
Starting merge as part of PR stack under #175067 |
|
Starting merge as part of PR stack under #175067 |
) ## Summary Move CUDA 12.8 GPU tests from per-commit trunk CI to periodic (~3x/day on weekdays). Both CUDA 12.8 and 13.0 are shipping wheel targets (nightly ships cu126, cu128, cu129, cu130), but their trunk CI test suites have **85-90% failure correlation** -- they almost always fail together. Over a 30-day analysis window covering 97 reverts and 38 significant regression events, **CUDA 12.8 never uniquely caught a regression that 13.0 missed**. CUDA 13.0 is kept per-commit because: - It is the **newest** shipping CUDA version - Most likely to surface **novel breakage** from new CUDA runtime behavior - Forward-looking CI should protect what's coming, not what's already stable CUDA 12.8 is moved to periodic because: - It is **mature and well-understood** -- breakage is less likely and less urgent - The rare 12.8-only regression can tolerate the ~8-hour periodic detection window - The 12.8 build job **remains in trunk** because `cross-compile-linux-test` depends on its artifacts **Estimated savings: ~1,270 GPU-hours/week (~5,080 GPU-hours/month)** This is the #2 savings opportunity from a broader CI workflow analysis (P2188981399) covering 128 PR+trunk jobs over 30 days. Combined with #175066 (CycleGAN skip, ~310 GPU-hours/week), total savings from this stack: **~1,580 GPU-hours/week (~6,320 GPU-hours/month)**. ### Changes - `trunk.yml`: remove CUDA 12.8 test job (5 default + 3 distributed + 1 pr_time_benchmarks + 1 libtorch shards) and no-ops build - `periodic.yml`: add default (5 GPU shards on g6.4xlarge) and distributed (3 multi-GPU shards on g4dn.12xlarge) to existing CUDA 12.8 periodic entry ## Test Plan - CUDA 12.8 GPU tests continue to run in periodic (3x/day weekdays) - CUDA 13.0 per-commit coverage is unchanged - Cross-compile-linux-test continues to work (12.8 build job kept) Pull Request resolved: #175067 Approved by: https://github.com/malfet ghstack dependencies: #175066
Summary: ## Summary Skip the `pytorch_CycleGAN_and_pix2pix` benchmark model from the inductor benchmark suite. This legacy 2017 model has been failing with `eager_fail_to_run` on 100% of commits since mid-2025, providing zero CI signal while consuming ~5.3M GPU-seconds/week across 7+ benchmark jobs on CUDA, CPU, and ROCm. **Estimated savings: ~310 GPU-hours/week (~1,240 GPU-hours/month)** Skip it in `torchbench.yaml` and remove its entries from all 31 expected accuracy CSV files. Also remove it from the `higher_fp16` tolerance list. See P2188981399 for the full CI workflow analysis. ## Test Plan - CI should pass with CycleGAN skipped (it was already failing 100% of the time) - No other benchmark models affected X-link: pytorch/pytorch#175066 Approved by: https://github.com/huydhn, https://github.com/malfet Reviewed By: atalman Differential Revision: D93507913 fbshipit-source-id: b9e0b750b38bd2a9afd72eddede007a1eedf0c09
|
@pytorchbot cherry-pick --onto release/2.11 -c critical |
#175066) ## Summary Skip the `pytorch_CycleGAN_and_pix2pix` benchmark model from the inductor benchmark suite. This legacy 2017 model has been failing with `eager_fail_to_run` on 100% of commits since mid-2025, providing zero CI signal while consuming ~5.3M GPU-seconds/week across 7+ benchmark jobs on CUDA, CPU, and ROCm. **Estimated savings: ~310 GPU-hours/week (~1,240 GPU-hours/month)** Skip it in `torchbench.yaml` and remove its entries from all 31 expected accuracy CSV files. Also remove it from the `higher_fp16` tolerance list. See P2188981399 for the full CI workflow analysis. ## Test Plan - CI should pass with CycleGAN skipped (it was already failing 100% of the time) - No other benchmark models affected Pull Request resolved: #175066 Approved by: https://github.com/huydhn, https://github.com/malfet (cherry picked from commit 688c943)
Cherry picking #175066The cherry pick PR is at #175299 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated: Details for Dev Infra teamRaised by workflow job |
) ## Summary Move CUDA 12.8 GPU tests from per-commit trunk CI to periodic (~3x/day on weekdays). Both CUDA 12.8 and 13.0 are shipping wheel targets (nightly ships cu126, cu128, cu129, cu130), but their trunk CI test suites have **85-90% failure correlation** -- they almost always fail together. Over a 30-day analysis window covering 97 reverts and 38 significant regression events, **CUDA 12.8 never uniquely caught a regression that 13.0 missed**. CUDA 13.0 is kept per-commit because: - It is the **newest** shipping CUDA version - Most likely to surface **novel breakage** from new CUDA runtime behavior - Forward-looking CI should protect what's coming, not what's already stable CUDA 12.8 is moved to periodic because: - It is **mature and well-understood** -- breakage is less likely and less urgent - The rare 12.8-only regression can tolerate the ~8-hour periodic detection window - The 12.8 build job **remains in trunk** because `cross-compile-linux-test` depends on its artifacts **Estimated savings: ~1,270 GPU-hours/week (~5,080 GPU-hours/month)** This is the #2 savings opportunity from a broader CI workflow analysis (P2188981399) covering 128 PR+trunk jobs over 30 days. Combined with #175066 (CycleGAN skip, ~310 GPU-hours/week), total savings from this stack: **~1,580 GPU-hours/week (~6,320 GPU-hours/month)**. ### Changes - `trunk.yml`: remove CUDA 12.8 test job (5 default + 3 distributed + 1 pr_time_benchmarks + 1 libtorch shards) and no-ops build - `periodic.yml`: add default (5 GPU shards on g6.4xlarge) and distributed (3 multi-GPU shards on g4dn.12xlarge) to existing CUDA 12.8 periodic entry ## Test Plan - CUDA 12.8 GPU tests continue to run in periodic (3x/day weekdays) - CUDA 13.0 per-commit coverage is unchanged - Cross-compile-linux-test continues to work (12.8 build job kept) Pull Request resolved: #175067 Approved by: https://github.com/malfet ghstack dependencies: #175066 (cherry picked from commit ef0353f)
#175299) [benchmark] Skip pytorch_CycleGAN_and_pix2pix from inductor benchmarks (#175066) ## Summary Skip the `pytorch_CycleGAN_and_pix2pix` benchmark model from the inductor benchmark suite. This legacy 2017 model has been failing with `eager_fail_to_run` on 100% of commits since mid-2025, providing zero CI signal while consuming ~5.3M GPU-seconds/week across 7+ benchmark jobs on CUDA, CPU, and ROCm. **Estimated savings: ~310 GPU-hours/week (~1,240 GPU-hours/month)** Skip it in `torchbench.yaml` and remove its entries from all 31 expected accuracy CSV files. Also remove it from the `higher_fp16` tolerance list. See P2188981399 for the full CI workflow analysis. ## Test Plan - CI should pass with CycleGAN skipped (it was already failing 100% of the time) - No other benchmark models affected Pull Request resolved: #175066 Approved by: https://github.com/huydhn, https://github.com/malfet (cherry picked from commit 688c943) Co-authored-by: Eli Uriegas <eliuriegas@meta.com>
) [CI] Move CUDA 12.8 GPU tests from per-commit trunk to periodic (#175067) ## Summary Move CUDA 12.8 GPU tests from per-commit trunk CI to periodic (~3x/day on weekdays). Both CUDA 12.8 and 13.0 are shipping wheel targets (nightly ships cu126, cu128, cu129, cu130), but their trunk CI test suites have **85-90% failure correlation** -- they almost always fail together. Over a 30-day analysis window covering 97 reverts and 38 significant regression events, **CUDA 12.8 never uniquely caught a regression that 13.0 missed**. CUDA 13.0 is kept per-commit because: - It is the **newest** shipping CUDA version - Most likely to surface **novel breakage** from new CUDA runtime behavior - Forward-looking CI should protect what's coming, not what's already stable CUDA 12.8 is moved to periodic because: - It is **mature and well-understood** -- breakage is less likely and less urgent - The rare 12.8-only regression can tolerate the ~8-hour periodic detection window - The 12.8 build job **remains in trunk** because `cross-compile-linux-test` depends on its artifacts **Estimated savings: ~1,270 GPU-hours/week (~5,080 GPU-hours/month)** This is the #2 savings opportunity from a broader CI workflow analysis (P2188981399) covering 128 PR+trunk jobs over 30 days. Combined with #175066 (CycleGAN skip, ~310 GPU-hours/week), total savings from this stack: **~1,580 GPU-hours/week (~6,320 GPU-hours/month)**. ### Changes - `trunk.yml`: remove CUDA 12.8 test job (5 default + 3 distributed + 1 pr_time_benchmarks + 1 libtorch shards) and no-ops build - `periodic.yml`: add default (5 GPU shards on g6.4xlarge) and distributed (3 multi-GPU shards on g4dn.12xlarge) to existing CUDA 12.8 periodic entry ## Test Plan - CUDA 12.8 GPU tests continue to run in periodic (3x/day weekdays) - CUDA 13.0 per-commit coverage is unchanged - Cross-compile-linux-test continues to work (12.8 build job kept) Pull Request resolved: #175067 Approved by: https://github.com/malfet ghstack dependencies: #175066 (cherry picked from commit ef0353f) Co-authored-by: Eli Uriegas <eliuriegas@meta.com>
- Re-enable `detectron2_maskrcnn` skip in skip.all. - Re-enable all `timm_*` model skips in skip.all. - Keep explicit upstream PR context comments for `modded_nanogpt` and `pytorch_CycleGAN_and_pix2pix`. - Remove stale expected-accuracy rows for skipped models. Relevant PRs: [1] pytorch/pytorch#120299 [2] pytorch/pytorch#164816 [3] pytorch/pytorch#172125 [4] pytorch/pytorch#175066 [5] #2306
This legacy 2017 model has been failing with eager_fail_to_run on 100% of commits since mid-2025, providing zero CI signal while consuming ~5.3M GPU-seconds/week across 7+ benchmark jobs on CUDA, CPU, and ROCm. Skip it in torchbench.yaml and remove its entries from all 31 expected accuracy CSV files. ghstack-source-id: b745ae2 Pull-Request: pytorch/pytorch#175066
Stack from ghstack (oldest at bottom):
Summary
Skip the
pytorch_CycleGAN_and_pix2pixbenchmark model from the inductor benchmark suite.This legacy 2017 model has been failing with
eager_fail_to_runon 100%of commits since mid-2025, providing zero CI signal while consuming
~5.3M GPU-seconds/week across 7+ benchmark jobs on CUDA, CPU, and ROCm.
Estimated savings: ~310 GPU-hours/week (~1,240 GPU-hours/month)
Skip it in
torchbench.yamland remove its entries from all 31 expectedaccuracy CSV files. Also remove it from the
higher_fp16tolerance list.See P2188981399 for the full CI workflow analysis.
Test Plan
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @chauhang @amjames @Lucaskabela @jataylo