Skip to content

Conversation

@caic99
Copy link
Member

@caic99 caic99 commented Mar 17, 2025

This PR adds two type of defining batch sizes:
max: Unlike auto, max does not do ceiling to batch size. Consider the case: batch size is set as auto:256, and you've got a system with 255 atoms. In this case, auto will infer a batch size of 2, while max remains 1 to avoid potential OOM.
filter: It removes any systems with atoms more than expected, and limits the total atom numbers in a batch like what max does.

I've running out of idea what the keywords should be, so please leave a comment if there is a better idea.

  • Todo: migrate changes to Paddle backend

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features
    • Enhanced batch size configuration options now allow the use of descriptive prefixes for more flexible control, including "max:N" and "filter:N".
    • Clarified the existing "auto:N" option for batch size configuration.
  • Bug Fixes
    • Improved validation ensures that batch sizes are always at least one.
    • Updated error feedback provides clear messaging for unsupported batch size formats.
  • Tests
    • Introduced a new test suite to validate batch size behavior under various input scenarios, including assertions for different formats and conditions.

Copilot AI review requested due to automatic review settings March 17, 2025 07:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces two new rules for defining batch sizes in the data loader to better handle systems with large numbers of atoms.

  • Added support for "max:" and "cap:" keywords to modify batch size rules.
  • For "cap:", systems exceeding the specified atom count are filtered out, and a warning is logged.
Comments suppressed due to low confidence (1)

deepmd/pt/utils/dataloader.py:145

  • [nitpick] Consider renaming variable 'ii' to 'system' for improved readability.
for ii in self.systems:

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 17, 2025

📝 Walkthrough

Walkthrough

The changes enhance the handling of the batch_size parameter in the construct_dataset function within the dataloader.py file. A new variable ceiling is introduced to manage batch size calculations based on the prefixes max:, filter:, and the existing auto:. The logic now includes filtering systems based on atom counts for the filter: prefix, and error handling has been improved to raise a ValueError for unsupported types. Additionally, a new test file is introduced to validate these changes.

Changes

Files Change Summary
deepmd/pt/utils/dataloader.py Enhanced handling of batch_size prefixes max: and filter:; introduced ceiling variable; modified batch size index (bsi) calculation; updated error handling to raise a ValueError.
source/tests/pt/test_dploaderset.py Added a new test class TestSampler with methods to validate the behavior of batch size handling, including tests for different input scenarios and expected exceptions.
doc/train/training-advanced.md Introduced new options for batch_size in training configuration: clarified "auto:N", added "max:N", and added "filter:N" options for enhanced control over dataset selection.
deepmd/utils/argcheck.py Added new string options for batch_size in training_data_args: "max:N" and "filter:N" to enhance flexibility in training data preparation.

Possibly related PRs

  • refactor: simplify dataset construction #4437: The changes in the main PR, which enhance the handling of the batch_size parameter in the construct_dataset function, are directly related to the modifications in the retrieved PR that also involve the dataloader.py file and the dataset construction process.

Suggested reviewers

  • njzjz
  • CaRoLZhangxy

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa07bf1 and bb41847.

📒 Files selected for processing (1)
  • source/tests/pt/test_dploaderset.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • source/tests/pt/test_dploaderset.py
⏰ Context from checks skipped due to timeout of 90000ms (29)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Analyze (python)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Build C++ (cpu, cpu)

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@codecov
Copy link

codecov bot commented Mar 17, 2025

Codecov Report

Attention: Patch coverage is 50.00000% with 21 lines in your changes missing coverage. Please review.

Project coverage is 84.80%. Comparing base (4c8600a) to head (bb41847).
Report is 86 commits behind head on devel.

Files with missing lines Patch % Lines
deepmd/pd/utils/dataloader.py 0.00% 21 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            devel    #4659      +/-   ##
==========================================
- Coverage   84.80%   84.80%   -0.01%     
==========================================
  Files         692      692              
  Lines       66345    66395      +50     
  Branches     3539     3538       -1     
==========================================
+ Hits        56265    56303      +38     
- Misses       8940     8951      +11     
- Partials     1140     1141       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@njzjz
Copy link
Member

njzjz commented Mar 17, 2025

I've running out of idea what the keywords should be, so please leave a comment if there is a better idea.

Have you asked DeepSeek?

@caic99
Copy link
Member Author

caic99 commented Mar 18, 2025

Have you asked DeepSeek?

It suggest replacing "cap:N" with "filter:N", which make sense to me. @iProzd Any comments?

Copy link
Collaborator

@wanghan-iapcm wanghan-iapcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pr is not covered by UT.

@caic99 caic99 marked this pull request as draft March 18, 2025 03:22
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
source/tests/pt/test_dploaderset.py (2)

75-78: Remove debug print statement

The print(cm.output) statement on line 77 appears to be a debug statement that should be removed before merging.

        with self.assertLogs() as cm:
            self.assertRaises(AssertionError, self.get_batch_sizes, "filter:191")
-            print(cm.output)  # DEBUG
            self.assertIn("Remove 1 systems with more than 191 atoms", cm.output[-1])

46-84: Consider adding tests for multiple systems with the filter option

The current tests only verify the filter option with a single system. Since the PR mentions that the filter method excludes systems that contain more atoms than the specified limit, it would be valuable to add tests with multiple systems of varying atom counts to ensure the filtering works correctly across systems.

Consider adding a test case like this:

def test_filter_multiple_systems(self) -> None:
    # Mock multiple systems with different atom counts
    original_systems = self.systems
    try:
        # Set up test with multiple mock systems
        # Implementation depends on how systems are represented
        # This is just a conceptual example
        self.systems = [...] # Multiple systems with varying atom counts
        
        # Test filtering logic
        result = self.get_batch_sizes("filter:200")
        # Assert expected behavior
        
    finally:
        # Restore original systems
        self.systems = original_systems
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 10defcb and 199d9b7.

📒 Files selected for processing (1)
  • source/tests/pt/test_dploaderset.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (29)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Analyze (python)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Test C++ (true)
🔇 Additional comments (1)
source/tests/pt/test_dploaderset.py (1)

1-87: The test coverage for the new batch size rules looks good

The test cases for the new max and filter batch size rules appear comprehensive and align well with the PR objectives. The test verifies that:

  1. The max method maintains a batch size of 1 when atoms exceed the threshold to prevent OOM issues
  2. The filter method correctly excludes systems that contain more atoms than the specified limit

caic99 and others added 2 commits March 20, 2025 17:07
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Chun Cai <amoycaic@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Chun Cai <amoycaic@gmail.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
source/tests/pt/test_dploaderset.py (2)

72-79: Clean up debugging print statement.

The test case for the "filter" batch size rule includes a debugging print statement that should be removed.

-            print(cm.output)  # DEBUG

1-88: Consider adding additional test cases for edge scenarios.

The current tests cover the basic functionality but could benefit from additional test cases:

  1. Testing with multiple systems with different atom counts
  2. Edge cases like very large atom thresholds
  3. Behavior when atom counts are exactly at the threshold boundary
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 199d9b7 and 5bf8fff.

📒 Files selected for processing (1)
  • source/tests/pt/test_dploaderset.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (29)
  • GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Analyze (python)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
🔇 Additional comments (4)
source/tests/pt/test_dploaderset.py (4)

36-44: The function signature and implementation match the requirements.

The method correctly instantiates a DpLoaderSet and returns the first batch size as expected.


46-64: Good comprehensive testing of different batch size formats.

The test cases cover integer, list, and string-based "auto" batch sizes with various thresholds, verifying that the auto rule correctly applies a ceiling function for determining batch sizes based on the number of atoms.


65-71: Good verification of the new "max" batch size rule.

These tests confirm that the "max" rule correctly implements the behavior described in the PR objectives - not applying a ceiling to prevent potential OOM issues.


80-83: Good error handling validation for unknown batch size rule.

The test correctly validates that an unknown batch size rule raises a ValueError with the appropriate error message.

@caic99 caic99 requested review from Copilot and iProzd March 21, 2025 04:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds new configuration options for defining batch sizes, introducing two new rules for handling large systems. The changes include updates to the batch size computation logic in the dataloader and new tests covering int, list, and string-based batch size inputs (including "auto", "max", and "filter" rules).

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
deepmd/pt/utils/dataloader.py Implements new batch size rules and corresponding logic.
source/tests/pt/test_dploaderset.py Adds tests verifying the new batch size behavior for various inputs.
Comments suppressed due to low confidence (1)

deepmd/pt/utils/dataloader.py:130

  • The PR description refers to a 'cap' rule for batch sizes, but the implementation uses 'filter:'. Consider aligning the naming between the documentation and the code for clarity.
elif batch_size.startswith("filter:"):

Copy link
Member

@njzjz njzjz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation needs to be updated

@caic99 caic99 requested a review from njzjz March 21, 2025 06:01
@github-actions github-actions bot added the Docs label Mar 21, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
doc/train/training-advanced.md (1)

109-111: Enhance Documentation Clarity and Consistency for New Batch Size Options

The descriptions for the new "auto:N", "max:N", and "filter:N" options accurately reflect the intended logic per the PR objectives:

  • "auto:N" correctly specifies that the batch size times the number of atoms is no less than N.
  • "max:N" clearly indicates that the product is no more than N with a safeguard of a minimum batch size of 1.
  • "filter:N" conveys that it functions like "max:N" but excludes systems exceeding the specified atom count and raises an error if no system remains.

Suggestions:

  • Consider revising the phrasing of the "max:N" and "filter:N" lines to ensure parallel structure and consistent punctuation—for example, starting each bullet point with "Automatically determines..." may improve readability.
  • Address the static analysis hints regarding loose punctuation around these lines by reviewing the sentence endings for clarity.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~110-~110: Loose punctuation mark.
Context: ...m is no less than N. - "max:N": automatically determines the batch size...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~111-~111: Loose punctuation mark.
Context: ...inimum batch size is 1. - "filter:N": the same as "max:N" but removes the s...

(UNLIKELY_OPENING_PUNCTUATION)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5fa3582 and b43a4ab.

📒 Files selected for processing (1)
  • doc/train/training-advanced.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
doc/train/training-advanced.md

[uncategorized] ~110-~110: Loose punctuation mark.
Context: ...m is no less than N. - "max:N": automatically determines the batch size...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~111-~111: Loose punctuation mark.
Context: ...inimum batch size is 1. - "filter:N": the same as "max:N" but removes the s...

(UNLIKELY_OPENING_PUNCTUATION)

⏰ Context from checks skipped due to timeout of 90000ms (26)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Analyze (c-cpp)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
doc/train/training-advanced.md (1)

109-111: Clarify Batch Size Option Descriptions

The descriptions for the "auto:N", "max:N", and "filter:N" options are clear and align well with the PR objectives. A couple of nitpicky suggestions to improve clarity and consistency:

  • For "max:N", consider adding a comma to separate the clause—for example:
    Before:
    "max:N": automatically determines the batch size so that ... is no more than N. The minimum batch size is 1.
    After:
    "max:N": automatically determines the batch size so that ... is no more than N, with a minimum batch size of 1.
  • For "filter:N", you might rephrase slightly for consistency. For example:
    Before:
    "filter:N": the same as "max:N" but removes the systems with the number of atoms larger than N from the data set. Throws an error if no system is left in a dataset.
    After:
    "filter:N": similar to "max:N", but it also removes systems whose atom count exceeds N from the dataset, throwing an error if no systems remain.

These adjustments address the minor loose punctuation issues flagged by static analysis.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~110-~110: Loose punctuation mark.
Context: ...m is no less than N. - "max:N": automatically determines the batch size...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~111-~111: Loose punctuation mark.
Context: ...inimum batch size is 1. - "filter:N": the same as "max:N" but removes the s...

(UNLIKELY_OPENING_PUNCTUATION)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5fa3582 and b43a4ab.

📒 Files selected for processing (1)
  • doc/train/training-advanced.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
doc/train/training-advanced.md

[uncategorized] ~110-~110: Loose punctuation mark.
Context: ...m is no less than N. - "max:N": automatically determines the batch size...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~111-~111: Loose punctuation mark.
Context: ...inimum batch size is 1. - "filter:N": the same as "max:N" but removes the s...

(UNLIKELY_OPENING_PUNCTUATION)

⏰ Context from checks skipped due to timeout of 90000ms (25)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Analyze (c-cpp)

Copy link
Member

@njzjz njzjz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update here

doc_batch_size = f'This key can be \n\n\
- list: the length of which is the same as the {link_sys}. The batch size of each system is given by the elements of the list.\n\n\
- int: all {link_sys} use the same batch size.\n\n\
- string "auto": automatically determines the batch size so that the batch_size times the number of atoms in the system is no less than 32.\n\n\
- string "auto:N": automatically determines the batch size so that the batch_size times the number of atoms in the system is no less than N.\n\n\
- string "mixed:N": the batch data will be sampled from all systems and merged into a mixed system with the batch size N. Only support the se_atten descriptor for TensorFlow backend.\n\n\
If MPI is used, the value should be considered as the batch size per task.'

@caic99 caic99 requested review from njzjz March 21, 2025 06:57
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
deepmd/pd/utils/dataloader.py (1)

127-164: Consider adding code comments explaining the ceiling behavior.

While the implementation of the ceiling variable and its usage is correct, adding a brief comment explaining the difference between ceiling=True (auto) and ceiling=False (max/filter) would improve code maintainability. This would help future developers understand the impact of this variable on batch size calculations.

 if batch_size == "auto":
     rule = 32
     ceiling = True
+    # When ceiling=True, we ensure rule/natoms rounds up to fit exactly rule atoms
 elif batch_size.startswith("auto:"):
     rule = int(batch_size.split(":")[1])
     ceiling = True
 elif batch_size.startswith("max:"):
     rule = int(batch_size.split(":")[1])
     ceiling = False
+    # When ceiling=False, we don't round up and use at most rule atoms per batch
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8a5fc80 and 281959e.

📒 Files selected for processing (1)
  • deepmd/pd/utils/dataloader.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (30)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Analyze (python)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Test C++ (true)
🔇 Additional comments (7)
deepmd/pd/utils/dataloader.py (7)

129-132: Implementation of ceiling variable looks good.

The code now explicitly sets ceiling = True for both auto and auto:N batch size rules, which helps clarify the behavior and sets the foundation for the new methods.


133-136: New max method implementation is well-designed.

The max: prefix implementation correctly sets ceiling = False to prevent applying a ceiling to the batch size, which aligns with the PR objective of not increasing batch sizes to prevent OOM issues.


136-153: Filter implementation provides good safeguards.

The implementation for the filter: method is thorough and includes:

  • Clear system filtering based on atom count
  • Appropriate warnings when systems are removed
  • Error handling when no systems remain after filtering
  • Setting ceiling = False consistent with the max: method

This will effectively prevent OOM issues by removing systems that exceed the atom count threshold.


154-154: Improved error message for unsupported batch size rules.

The error message now clearly indicates what type of batch size rule is unsupported, which will help users troubleshoot configuration issues.


158-164: Batch size calculation logic handles both ceiling modes correctly.

The conditional logic for calculating the batch size index (bsi) now properly handles both ceiling behaviors:

  • When ceiling = True: Increments bsi if there's a remainder (original behavior)
  • When ceiling = False: Ensures bsi is at least 1 to handle cases where rule < natoms

This implementation aligns with the intended behavior differences between auto and the new max/filter methods.


136-153:

Details

✅ Verification successful

Verify filter behavior with unit tests.

While the implementation for the filter method is robust, it would be valuable to ensure it's covered by unit tests that verify:

  1. Systems exceeding the atom count are properly filtered
  2. Warnings are generated when systems are removed
  3. Errors are raised when no systems remain

🏁 Script executed:

#!/bin/bash
# Check if unit tests exist for the new filter functionality

# Look for test files that might be testing the batch size functionality
fd -t f "test.*dataloader" -E "*.pyc"

# Search for tests specifically related to the new batch size methods
rg -i "test.*batch.*size|test.*filter|test.*max.*batch" --type py

Length of output: 1423


Filter functionality verified via existing tests—please confirm complete edge-case coverage.

The current unit tests in the repository (e.g., in
source/tests/tf/test_deepmd_data_sys.py (e.g., test_batch_size_5 and test_batch_size_raise),
source/tests/pt/test_dploaderset.py (where comments indicate testing batch_size values like "filter")
show that filtering behavior is exercised. These tests should verify that:

  • Systems with atom counts exceeding the specified threshold are properly removed.
  • A warning is logged when one or more systems are filtered out.
  • A ValueError is raised when all systems are removed through filtering.

Please confirm that each of these scenarios is covered by dedicated assertions—especially checking the warning logs when systems are removed. If any of these aspects aren’t thoroughly verified, additional test cases might be needed.


133-136:

Details

❓ Verification inconclusive

Ensure compatibility with Paddle backend.

As mentioned in the PR objectives, there's a plan to migrate these changes to the Paddle backend. Since this file is already in the Paddle backend (deepmd/pd/utils/dataloader.py), verify that equivalent changes are made to any similar files in other backends for consistency.


🏁 Script executed:

#!/bin/bash
# Check if similar dataloader implementations exist that might need the same changes

# Look for other dataloader.py files
fd -t f "dataloader.py"

# Specifically check for similar batching logic in other backends
rg "batch_size.*startswith.*auto" --type py

Length of output: 287


Attention: Verify Cross-Backend Consistency for Batch Size Handling

The updated batching logic in deepmd/pd/utils/dataloader.py now processes "max:" and "filter:" prefixes as expected for the Paddle backend. However, our repository search shows that the PyTorch backend file (deepmd/pt/utils/dataloader.py) only contains logic for the "auto:" prefix. Please verify that:

  • If the modifications for "max:" and "filter:" are intended to be exclusive to the Paddle backend (in line with the migration objectives), then it’s acceptable that the PyTorch backend retains its current behavior.
  • If a unified behavior across backends is desired, similar changes should be applied to deepmd/pt/utils/dataloader.py to handle "max:" and "filter:" cases consistently.

Ensure that the final design decision and documentation are updated accordingly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
doc/train/training-advanced.md (2)

110-110: Review "max:N" Documentation Clarity
The new "max:N" option is well-described: it specifies that the batch size should be determined so that the product with the number of atoms is no more than N, with a minimum batch size of 1 and clearly lists the supported backends. Consider revising the punctuation slightly (for example, inserting a comma or semicolon after the condition) to enhance readability.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~110-~110: Loose punctuation mark.
Context: ...m is no less than N. - "max:N": automatically determines the batch size...

(UNLIKELY_OPENING_PUNCTUATION)


111-111: Review "filter:N" Documentation Details
The "filter:N" option is clearly explained—it behaves like "max:N" but also filters out systems whose atom counts exceed N, raising an error if no system remains. For added clarity, you might consider specifying which exception is raised or elaborating on the error behavior.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 281959e and aa07bf1.

📒 Files selected for processing (1)
  • doc/train/training-advanced.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
doc/train/training-advanced.md

[uncategorized] ~110-~110: Loose punctuation mark.
Context: ...m is no less than N. - "max:N": automatically determines the batch size...

(UNLIKELY_OPENING_PUNCTUATION)

⏰ Context from checks skipped due to timeout of 90000ms (24)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
🔇 Additional comments (1)
doc/train/training-advanced.md (1)

109-109: Clarify "auto:N" Documentation
The updated description for "auto:N" clearly states that the batch size is determined such that the product of the batch size and the number of atoms in a system is no less than N. This wording is consistent with previous documentation.

@caic99 caic99 marked this pull request as ready for review March 24, 2025 03:11
@iProzd iProzd enabled auto-merge March 26, 2025 04:30
@iProzd iProzd added this pull request to the merge queue Mar 26, 2025
Merged via the queue into deepmodeling:devel with commit 2e4da81 Mar 26, 2025
60 checks passed
@caic99 caic99 deleted the feat-bs-rules branch March 26, 2025 07:28
iProzd pushed a commit to iProzd/deepmd-kit that referenced this pull request Jun 30, 2025
This PR adds two type of defining batch sizes:
`max`: Unlike `auto`, `max` does not do ceiling to batch size. Consider
the case: batch size is set as `auto:256`, and you've got a system with
255 atoms. In this case, `auto` will infer a batch size of 2, while
`max` remains 1 to avoid potential OOM.
`filter`: It removes any systems with atoms more than expected, and
limits the total atom numbers in a batch like what `max` does.

I've running out of idea what the keywords should be, so please leave a
comment if there is a better idea.

- [ ] Todo: migrate changes to Paddle backend 

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Summary by CodeRabbit

- **New Features**
- Enhanced batch size configuration options now allow the use of
descriptive prefixes for more flexible control, including `"max:N"` and
`"filter:N"`.
- Clarified the existing `"auto:N"` option for batch size configuration.
- **Bug Fixes**
	- Improved validation ensures that batch sizes are always at least one.
- Updated error feedback provides clear messaging for unsupported batch
size formats.
- **Tests**
- Introduced a new test suite to validate batch size behavior under
various input scenarios, including assertions for different formats and
conditions.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Chun Cai <amoycaic@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
iProzd pushed a commit to iProzd/deepmd-kit that referenced this pull request Jun 30, 2025
This PR adds two type of defining batch sizes:
`max`: Unlike `auto`, `max` does not do ceiling to batch size. Consider
the case: batch size is set as `auto:256`, and you've got a system with
255 atoms. In this case, `auto` will infer a batch size of 2, while
`max` remains 1 to avoid potential OOM.
`filter`: It removes any systems with atoms more than expected, and
limits the total atom numbers in a batch like what `max` does.

I've running out of idea what the keywords should be, so please leave a
comment if there is a better idea.

- [ ] Todo: migrate changes to Paddle backend 

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Summary by CodeRabbit

- **New Features**
- Enhanced batch size configuration options now allow the use of
descriptive prefixes for more flexible control, including `"max:N"` and
`"filter:N"`.
- Clarified the existing `"auto:N"` option for batch size configuration.
- **Bug Fixes**
	- Improved validation ensures that batch sizes are always at least one.
- Updated error feedback provides clear messaging for unsupported batch
size formats.
- **Tests**
- Introduced a new test suite to validate batch size behavior under
various input scenarios, including assertions for different formats and
conditions.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Chun Cai <amoycaic@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants