Skip to content

Conversation

@njzjz
Copy link
Member

@njzjz njzjz commented Dec 2, 2025

This will make the CI more robust, i.e., new TF/PT versions will not break the CI until we bump the version.

Summary by CodeRabbit

  • New Features

    • Added CPU-optimized build support for PyTorch and TensorFlow.
    • Platform-specific dependency resolution for Linux x86_64, macOS, and other architectures.
  • Chores

    • Updated build configuration to use configurable dependency groups and conditional pins for CPU vs GPU builds.
    • Adjusted CI/build selection logic to better handle CPU-targeted packaging across environments.

✏️ Tip: You can customize this high-level summary in your review settings.

@codecov
Copy link

codecov bot commented Dec 2, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.28%. Comparing base (91c5110) to head (637fd9b).
⚠️ Report is 2 commits behind head on devel.

Additional details and impacted files
@@           Coverage Diff           @@
##            devel    #5071   +/-   ##
=======================================
  Coverage   84.28%   84.28%           
=======================================
  Files         709      709           
  Lines       70562    70561    -1     
  Branches     3619     3618    -1     
=======================================
+ Hits        59472    59473    +1     
+ Misses       9924     9923    -1     
+ Partials     1166     1165    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@njzjz njzjz marked this pull request as ready for review December 2, 2025 07:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to make CI builds more robust by pinning TensorFlow and PyTorch dependencies to specific versions defined in global dependency groups, preventing CI breakage from new upstream releases.

Key changes:

  • Added dependency_groups package as a build requirement to support PEP 735 dependency groups
  • Created centralized dependency group definitions in pyproject.toml with platform-specific version pins
  • Refactored CI build logic to read pinned versions from dependency groups instead of hardcoding them

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
pyproject.toml Added dependency_groups build dependency and defined pin_tensorflow_cpu, pin_tensorflow_gpu, pin_pytorch_cpu, and pin_pytorch_gpu dependency groups with platform-specific version constraints
backend/utils.py New utility module providing read_dependencies_from_dependency_group() function to extract dependencies from pyproject.toml dependency groups
backend/find_tensorflow.py Updated to use pinned TensorFlow dependencies from dependency groups for CUDA 12+ builds instead of hardcoded version strings
backend/find_pytorch.py Updated to use pinned PyTorch dependencies from dependency groups for CUDA 12+ builds, removed unused platform import
Comments suppressed due to low confidence (1)

backend/find_tensorflow.py:112

  • Potential conflicting requirements: When CIBUILDWHEEL is set and CUDA version is >= 12, the code adds pinned requirements from read_dependencies_from_dependency_group("pin_tensorflow_cpu") to requires, but then also extends requires with unpinned requirements from get_tf_requirement("")["cpu"].

For example, on Linux x86_64, this would result in both:

  • tensorflow-cpu~=2.18.0; platform_machine=='x86_64' and platform_system == 'Linux' (pinned, from line 100)
  • tensorflow-cpu; platform_machine!='aarch64' and (platform_machine!='arm64' or platform_system != 'Darwin') (unpinned, from line 112)

Both requirements match the same platform, which may cause conflicting version constraints.

Consider setting tf_version to a specific version (e.g., "2.18.0") when using the pinned dependency group, or conditionally skip calling get_tf_requirement() when pinned requirements are already added:

if cuda_version == "" or cuda_version in SpecifierSet(">=12,<13"):
    requires.extend(
        read_dependencies_from_dependency_group("pin_tensorflow_cpu")
    )
    tf_version = None  # Skip adding unpinned requirements
elif cuda_version in SpecifierSet(">=11,<12"):
    # ...
            if cuda_version == "" or cuda_version in SpecifierSet(">=12,<13"):
                # CUDA 12.2, cudnn 9
                # or CPU builds
                requires.extend(
                    read_dependencies_from_dependency_group("pin_tensorflow_cpu")
                )
            elif cuda_version in SpecifierSet(">=11,<12"):
                # CUDA 11.8, cudnn 8
                requires.extend(
                    [
                        "tensorflow-cpu>=2.5.0,<2.15; platform_machine=='x86_64' and platform_system == 'Linux'",
                    ]
                )
                tf_version = "2.14.1"
            else:
                raise RuntimeError("Unsupported CUDA version") from None
        requires.extend(get_tf_requirement(tf_version)["cpu"])

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Jinzhe Zeng <njzjz@qq.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 2, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds a utility to read dependency-groups from pyproject.toml and updates backend finders to use it for CPU-specific TensorFlow and PyTorch pins, replacing prior hardcoded strings and broadening CI/CIBUILDWHEEL CPU pin logic.

Changes

Cohort / File(s) Change Summary
Dependency group configuration
pyproject.toml
Added dependency_groups with platform-conditional pins for CPU TensorFlow and PyTorch (multiple platform_machine/platform_system conditions and comments about macOS x86 deprecation).
Dependency resolution utility
backend/utils.py
New function read_dependencies_from_dependency_group(group: str) -> tuple[str, ...] that loads pyproject.toml (using tomllib or tomli) and resolves the named dependency group. Includes LGPL-3.0-or-later header.
Backend dependency injection
backend/find_pytorch.py, backend/find_tensorflow.py
Import and use read_dependencies_from_dependency_group(...). find_pytorch.py appends CPU pin requirements when CIBUILDWHEEL and CPU-compatible CUDA_VERSION conditions are met; find_tensorflow.py replaces a hardcoded CPU TF string with the dependency-group result. Existing CI-MPI/CUDA logic retained and combined with the new pins.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Verify backend/utils.py correctly selects and uses tomllib vs tomli and handles missing keys or unexpected pyproject structure.
  • Check find_pytorch.py CIBUILDWHEEL and CUDA_VERSION condition logic and that cibw_requirement appends safely to existing requirements.
  • Confirm find_tensorflow.py replacement preserves prior platform/CPU vs GPU branching.
  • Validate pyproject.toml conditional markers (platform_machine, platform_system) are syntactically correct.

Possibly related PRs

Suggested reviewers

  • wanghan-iapcm

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main change: pinning TensorFlow and PyTorch dependencies in cibuildwheel to match global pinnings across multiple backend files and pyproject.toml.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
backend/find_pytorch.py (1)

32-34: CIBW PyTorch CPU pinning via dependency groups looks good; fix minor docstring nit

  • The new cibw_requirement handling with read_dependencies_from_dependency_group("pin_pytorch_cpu") in the CIBUILDWHEEL + CPU / CUDA≥12 path cleanly reuses the central pyproject pins and composes them with the existing torch>=2.1.0/pt_version logic and mpich requirement. This should keep CIBW CPU builds on the globally pinned torch versions while preserving MPI constraints. Based on learnings, this also aligns with PyTorch ≥2.8 where the default PyPI wheel is CPU‑only.
  • Minor nit: the find_pytorch docstring’s return description still says “TensorFlow requirement” (Line 56). Consider updating it to “PyTorch requirement” to avoid confusion.

Also applies to: 96-148

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 91c5110 and 4c24d08.

📒 Files selected for processing (4)
  • backend/find_pytorch.py (3 hunks)
  • backend/find_tensorflow.py (2 hunks)
  • backend/utils.py (1 hunks)
  • pyproject.toml (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4884
File: .github/workflows/test_cuda.yml:46-46
Timestamp: 2025-08-14T07:11:51.357Z
Learning: As of PyTorch 2.8 (August 2025), the default wheel on PyPI installed by `pip install torch` is CPU-only. CUDA-enabled wheels are available on PyPI for Linux x86 and Windows x86 platforms, but require explicit specification via index URLs or variant-aware installers. For CUDA support, use `--index-url https://download.pytorch.org/whl/cu126` (or appropriate CUDA version).
Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-24T22:42:23.128Z
Learning: Install backend dependencies (TensorFlow, PyTorch, JAX, or Paddle) before building C++ components, and set TENSORFLOW_ROOT and PYTORCH_ROOT environment variables
📚 Learning: 2025-08-14T07:11:51.357Z
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4884
File: .github/workflows/test_cuda.yml:46-46
Timestamp: 2025-08-14T07:11:51.357Z
Learning: As of PyTorch 2.8 (August 2025), the default wheel on PyPI installed by `pip install torch` is CPU-only. CUDA-enabled wheels are available on PyPI for Linux x86 and Windows x86 platforms, but require explicit specification via index URLs or variant-aware installers. For CUDA support, use `--index-url https://download.pytorch.org/whl/cu126` (or appropriate CUDA version).

Applied to files:

  • pyproject.toml
  • backend/find_pytorch.py
🧬 Code graph analysis (2)
backend/find_tensorflow.py (1)
backend/utils.py (1)
  • read_dependencies_from_dependency_group (17-36)
backend/find_pytorch.py (1)
backend/utils.py (1)
  • read_dependencies_from_dependency_group (17-36)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: CodeQL analysis (python)
  • GitHub Check: Agent
🔇 Additional comments (2)
pyproject.toml (1)

8-9: TF/PT CPU dependency groups and build-system wiring look consistent with backend usage

  • Adding "dependency_groups" to [build-system].requires matches the new backend.utils import, so the build backend can resolve groups without relying on runtime deps.
  • The pin_tensorflow_cpu and pin_pytorch_cpu groups appear to partition major platforms as intended (Linux x86_64 vs macOS x86_64 vs “everything else”), and their names match the strings used in read_dependencies_from_dependency_group(...).

From my side this configuration is coherent with the updated find_tensorflow/find_pytorch logic; you might still want to sanity‑check the markers on a couple of target platforms (Linux, macOS, Windows) via a dry‑run resolver (pip/uv) to confirm they expand as expected.

Also applies to: 156-183

backend/find_tensorflow.py (1)

29-31: Using dependency-group TF CPU pins in CIBW builds looks correct

The new import and the change to:

  • treat CUDA_VERSION == "" or >=12,<13 as the CPU/“modern CUDA” path, and
  • extend requires with read_dependencies_from_dependency_group("pin_tensorflow_cpu")

nicely centralize TF CPU pinning in pyproject.toml while still layering on get_tf_requirement(tf_version)["cpu"] for ancillary deps. The group name matches the [dependency-groups] entry and requires.extend(...) is compatible with the returned iterable.

Also applies to: 93-101

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
backend/utils.py (1)

34-34: Consider safer dictionary access with explicit error handling.

Line 34 uses direct dictionary access pyproject["dependency-groups"], which will raise a generic KeyError if the key is missing. A previous review comment suggested using .get() with explicit error handling to provide clearer diagnostics.

Apply this diff to add explicit error handling:

-    groups = pyproject["dependency-groups"]
+    groups = pyproject.get("dependency-groups")
+    if groups is None:
+        raise ValueError(
+            "Missing 'dependency-groups' section in pyproject.toml"
+        )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c24d08 and 637fd9b.

📒 Files selected for processing (1)
  • backend/utils.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (25)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Test C++ (true)
🔇 Additional comments (2)
backend/utils.py (2)

17-30: The docstring has been correctly updated.

The Returns section (lines 28-29) now correctly states tuple[str, ...] to match the return type annotation. This addresses the previous comment about the list vs. tuple inconsistency.


7-14: Verify that external dependencies are declared.

The code imports from dependency_groups (line 7) and conditionally imports tomli for Python < 3.11 (line 14). Ensure these packages are properly declared in the project's dependencies configuration.

@njzjz njzjz enabled auto-merge December 2, 2025 18:18
@njzjz njzjz added this pull request to the merge queue Dec 2, 2025
Merged via the queue into deepmodeling:devel with commit a38f483 Dec 2, 2025
60 checks passed
@njzjz njzjz deleted the pin_cibw branch December 2, 2025 21:23
ChiahsinChu pushed a commit to ChiahsinChu/deepmd-kit that referenced this pull request Dec 17, 2025
This will make the CI more robust, i.e., new TF/PT versions will not
break the CI until we bump the version.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
  * Added CPU-optimized build support for PyTorch and TensorFlow.
* Platform-specific dependency resolution for Linux x86_64, macOS, and
other architectures.

* **Chores**
* Updated build configuration to use configurable dependency groups and
conditional pins for CPU vs GPU builds.
* Adjusted CI/build selection logic to better handle CPU-targeted
packaging across environments.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Jinzhe Zeng <njzjz@qq.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants