[Temp CI Patch]: torch version for UT#2856
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request implements a temporary fix for CI failures by restricting the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a temporary patch to the Buildkite CI pipeline to pin the torch version, aiming to resolve a compatibility issue with the CI machine's CUDA version. While the intention is correct, the implementation has a critical flaw: the specified torch version range torch>=2.6.0,<2.11.0 is invalid and will break the build. My review includes a comment with a suggested fix that uses a valid version specifier and also improves maintainability by adding a TODO comment to track the removal of this temporary fix. This is important to prevent accumulating technical debt in the CI configuration.
| # Pin torch to a version compatible with the CI machine's CUDA 12.x driver | ||
| uv pip install "torch>=2.6.0,<2.11.0" |
There was a problem hiding this comment.
The specified torch version range torch>=2.6.0,<2.11.0 is invalid as no public torch versions exist in this range, which will cause the CI step to fail.
Additionally, since this is a temporary patch, it's a best practice to add a TODO comment with a reference to a tracking issue. This makes the temporary nature of the fix explicit and helps ensure it's removed later to avoid technical debt. The suggested change below corrects the version pin and adds a TODO for tracking.
# TODO(CI): Remove torch pin after upgrading runners beyond CUDA 12.1. See issue #<issue_number>.
uv pip install "torch<2.4.0"UT patch Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu>
UT patch Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu>
UT patch Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu>
UT patch Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu>
torch11 was just released built on CUDA 13.0 but our Unit Test CI machine still has CUDA 12.1. temp unblock solution.Note
Low Risk
Low risk: only adjusts CI dependency installation for NVIDIA runners; main risk is CI flakes if the pinned
torchrange is too narrow or conflicts with other requirements.Overview
Updates the Buildkite unit-test pipeline to pin
torchto>=2.6.0,<2.11.0on NVIDIA (CUDA) runners, ensuring compatibility with the CI machines' CUDA 12.x drivers.AMD/ROCm installation behavior is unchanged.
Written by Cursor Bugbot for commit bc60cfb. This will update automatically on new commits. Configure here.