[CI] Move CUDA 12.8 GPU tests from per-commit trunk to periodic by pytorchbot · Pull Request #175300 · pytorch/pytorch

pytorchbot · 2026-02-19T01:27:55Z

Stack from ghstack (oldest at bottom):

Summary

Move CUDA 12.8 GPU tests from per-commit trunk CI to periodic (~3x/day on weekdays).

Both CUDA 12.8 and 13.0 are shipping wheel targets (nightly ships cu126, cu128, cu129, cu130), but their trunk CI test suites have 85-90% failure correlation -- they almost always fail together. Over a 30-day analysis window covering 97 reverts and 38 significant regression events, CUDA 12.8 never uniquely caught a regression that 13.0 missed.

CUDA 13.0 is kept per-commit because:

It is the newest shipping CUDA version
Most likely to surface novel breakage from new CUDA runtime behavior
Forward-looking CI should protect what's coming, not what's already stable

CUDA 12.8 is moved to periodic because:

It is mature and well-understood -- breakage is less likely and less urgent
The rare 12.8-only regression can tolerate the ~8-hour periodic detection window
The 12.8 build job remains in trunk because cross-compile-linux-test depends on its artifacts

Estimated savings: ~1,270 GPU-hours/week (~5,080 GPU-hours/month)

This is the #2 savings opportunity from a broader CI workflow analysis (P2188981399) covering 128 PR+trunk jobs over 30 days. Combined with #175066 (CycleGAN skip, ~310 GPU-hours/week), total savings from this stack: ~1,580 GPU-hours/week (~6,320 GPU-hours/month).

Changes

trunk.yml: remove CUDA 12.8 test job (5 default + 3 distributed + 1 pr_time_benchmarks + 1 libtorch shards) and no-ops build
periodic.yml: add default (5 GPU shards on g6.4xlarge) and distributed (3 multi-GPU shards on g4dn.12xlarge) to existing CUDA 12.8 periodic entry

Test Plan

CUDA 12.8 GPU tests continue to run in periodic (3x/day weekdays)
CUDA 13.0 per-commit coverage is unchanged
Cross-compile-linux-test continues to work (12.8 build job kept)

cc @pytorch/pytorch-dev-infra

) ## Summary Move CUDA 12.8 GPU tests from per-commit trunk CI to periodic (~3x/day on weekdays). Both CUDA 12.8 and 13.0 are shipping wheel targets (nightly ships cu126, cu128, cu129, cu130), but their trunk CI test suites have **85-90% failure correlation** -- they almost always fail together. Over a 30-day analysis window covering 97 reverts and 38 significant regression events, **CUDA 12.8 never uniquely caught a regression that 13.0 missed**. CUDA 13.0 is kept per-commit because: - It is the **newest** shipping CUDA version - Most likely to surface **novel breakage** from new CUDA runtime behavior - Forward-looking CI should protect what's coming, not what's already stable CUDA 12.8 is moved to periodic because: - It is **mature and well-understood** -- breakage is less likely and less urgent - The rare 12.8-only regression can tolerate the ~8-hour periodic detection window - The 12.8 build job **remains in trunk** because `cross-compile-linux-test` depends on its artifacts **Estimated savings: ~1,270 GPU-hours/week (~5,080 GPU-hours/month)** This is the #2 savings opportunity from a broader CI workflow analysis (P2188981399) covering 128 PR+trunk jobs over 30 days. Combined with #175066 (CycleGAN skip, ~310 GPU-hours/week), total savings from this stack: **~1,580 GPU-hours/week (~6,320 GPU-hours/month)**. ### Changes - `trunk.yml`: remove CUDA 12.8 test job (5 default + 3 distributed + 1 pr_time_benchmarks + 1 libtorch shards) and no-ops build - `periodic.yml`: add default (5 GPU shards on g6.4xlarge) and distributed (3 multi-GPU shards on g4dn.12xlarge) to existing CUDA 12.8 periodic entry ## Test Plan - CUDA 12.8 GPU tests continue to run in periodic (3x/day weekdays) - CUDA 13.0 per-commit coverage is unchanged - Cross-compile-linux-test continues to work (12.8 build job kept) Pull Request resolved: #175067 Approved by: https://github.com/malfet ghstack dependencies: #175066 (cherry picked from commit ef0353f)

pytorch-bot · 2026-02-19T01:27:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/175300

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d807fca with merge base 0fd766e ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

atalman

lgtm

pytorchbot requested a review from a team as a code owner February 19, 2026 01:27

pytorchbot mentioned this pull request Feb 19, 2026

[v.2.11.0] Release Tracker #175093

Open

pytorchbot mentioned this pull request Feb 19, 2026

[CI] Move CUDA 12.8 GPU tests from per-commit trunk to periodic #175067

Closed

pytorch-bot bot added the topic: not user facing topic category label Feb 19, 2026

pytorchbot added the open source label Feb 19, 2026

atalman approved these changes Feb 19, 2026

View reviewed changes

atalman merged commit d80a584 into release/2.11 Feb 19, 2026
110 checks passed

atalman mentioned this pull request Mar 13, 2026

Release 2.11 validations checklist and cherry-picks #177422

Open

74 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Move CUDA 12.8 GPU tests from per-commit trunk to periodic#175300

[CI] Move CUDA 12.8 GPU tests from per-commit trunk to periodic#175300
atalman merged 1 commit intorelease/2.11from
cherry-pick-175067-by-pytorch_bot_bot_

pytorchbot commented Feb 19, 2026

Uh oh!

pytorch-bot bot commented Feb 19, 2026 •

edited

Loading

Uh oh!

atalman left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pytorchbot commented Feb 19, 2026

Summary

Changes

Test Plan

Uh oh!

pytorch-bot bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/175300

✅ No Failures

Uh oh!

atalman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot bot commented Feb 19, 2026 •

edited

Loading