-
Notifications
You must be signed in to change notification settings - Fork 584
feat: add new batch size rules for large systems #4659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces two new rules for defining batch sizes in the data loader to better handle systems with large numbers of atoms.
- Added support for "max:" and "cap:" keywords to modify batch size rules.
- For "cap:", systems exceeding the specified atom count are filtered out, and a warning is logged.
Comments suppressed due to low confidence (1)
deepmd/pt/utils/dataloader.py:145
- [nitpick] Consider renaming variable 'ii' to 'system' for improved readability.
for ii in self.systems:
📝 WalkthroughWalkthroughThe changes enhance the handling of the Changes
Possibly related PRs
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (29)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## devel #4659 +/- ##
==========================================
- Coverage 84.80% 84.80% -0.01%
==========================================
Files 692 692
Lines 66345 66395 +50
Branches 3539 3538 -1
==========================================
+ Hits 56265 56303 +38
- Misses 8940 8951 +11
- Partials 1140 1141 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Have you asked DeepSeek? |
It suggest replacing "cap:N" with "filter:N", which make sense to me. @iProzd Any comments? |
wanghan-iapcm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pr is not covered by UT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
source/tests/pt/test_dploaderset.py (2)
75-78: Remove debug print statementThe
print(cm.output)statement on line 77 appears to be a debug statement that should be removed before merging.with self.assertLogs() as cm: self.assertRaises(AssertionError, self.get_batch_sizes, "filter:191") - print(cm.output) # DEBUG self.assertIn("Remove 1 systems with more than 191 atoms", cm.output[-1])
46-84: Consider adding tests for multiple systems with the filter optionThe current tests only verify the
filteroption with a single system. Since the PR mentions that thefiltermethod excludes systems that contain more atoms than the specified limit, it would be valuable to add tests with multiple systems of varying atom counts to ensure the filtering works correctly across systems.Consider adding a test case like this:
def test_filter_multiple_systems(self) -> None: # Mock multiple systems with different atom counts original_systems = self.systems try: # Set up test with multiple mock systems # Implementation depends on how systems are represented # This is just a conceptual example self.systems = [...] # Multiple systems with varying atom counts # Test filtering logic result = self.get_batch_sizes("filter:200") # Assert expected behavior finally: # Restore original systems self.systems = original_systems
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
source/tests/pt/test_dploaderset.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (29)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Analyze (python)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Test C++ (false)
- GitHub Check: Test C++ (true)
🔇 Additional comments (1)
source/tests/pt/test_dploaderset.py (1)
1-87: The test coverage for the new batch size rules looks goodThe test cases for the new
maxandfilterbatch size rules appear comprehensive and align well with the PR objectives. The test verifies that:
- The
maxmethod maintains a batch size of 1 when atoms exceed the threshold to prevent OOM issues- The
filtermethod correctly excludes systems that contain more atoms than the specified limit
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Chun Cai <amoycaic@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Chun Cai <amoycaic@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
source/tests/pt/test_dploaderset.py (2)
72-79: Clean up debugging print statement.The test case for the "filter" batch size rule includes a debugging print statement that should be removed.
- print(cm.output) # DEBUG
1-88: Consider adding additional test cases for edge scenarios.The current tests cover the basic functionality but could benefit from additional test cases:
- Testing with multiple systems with different atom counts
- Edge cases like very large atom thresholds
- Behavior when atom counts are exactly at the threshold boundary
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
source/tests/pt/test_dploaderset.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (29)
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Analyze (python)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Test C++ (false)
- GitHub Check: Test C++ (true)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
🔇 Additional comments (4)
source/tests/pt/test_dploaderset.py (4)
36-44: The function signature and implementation match the requirements.The method correctly instantiates a
DpLoaderSetand returns the first batch size as expected.
46-64: Good comprehensive testing of different batch size formats.The test cases cover integer, list, and string-based "auto" batch sizes with various thresholds, verifying that the auto rule correctly applies a ceiling function for determining batch sizes based on the number of atoms.
65-71: Good verification of the new "max" batch size rule.These tests confirm that the "max" rule correctly implements the behavior described in the PR objectives - not applying a ceiling to prevent potential OOM issues.
80-83: Good error handling validation for unknown batch size rule.The test correctly validates that an unknown batch size rule raises a ValueError with the appropriate error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds new configuration options for defining batch sizes, introducing two new rules for handling large systems. The changes include updates to the batch size computation logic in the dataloader and new tests covering int, list, and string-based batch size inputs (including "auto", "max", and "filter" rules).
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| deepmd/pt/utils/dataloader.py | Implements new batch size rules and corresponding logic. |
| source/tests/pt/test_dploaderset.py | Adds tests verifying the new batch size behavior for various inputs. |
Comments suppressed due to low confidence (1)
deepmd/pt/utils/dataloader.py:130
- The PR description refers to a 'cap' rule for batch sizes, but the implementation uses 'filter:'. Consider aligning the naming between the documentation and the code for clarity.
elif batch_size.startswith("filter:"):
njzjz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation needs to be updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
doc/train/training-advanced.md (1)
109-111: Enhance Documentation Clarity and Consistency for New Batch Size OptionsThe descriptions for the new
"auto:N","max:N", and"filter:N"options accurately reflect the intended logic per the PR objectives:
"auto:N"correctly specifies that the batch size times the number of atoms is no less thanN."max:N"clearly indicates that the product is no more thanNwith a safeguard of a minimum batch size of 1."filter:N"conveys that it functions like"max:N"but excludes systems exceeding the specified atom count and raises an error if no system remains.Suggestions:
- Consider revising the phrasing of the
"max:N"and"filter:N"lines to ensure parallel structure and consistent punctuation—for example, starting each bullet point with "Automatically determines..." may improve readability.- Address the static analysis hints regarding loose punctuation around these lines by reviewing the sentence endings for clarity.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~110-~110: Loose punctuation mark.
Context: ...m is no less thanN. -"max:N": automatically determines the batch size...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~111-~111: Loose punctuation mark.
Context: ...inimum batch size is 1. -"filter:N": the same as"max:N"but removes the s...(UNLIKELY_OPENING_PUNCTUATION)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
doc/train/training-advanced.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
doc/train/training-advanced.md
[uncategorized] ~110-~110: Loose punctuation mark.
Context: ...m is no less than N. - "max:N": automatically determines the batch size...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~111-~111: Loose punctuation mark.
Context: ...inimum batch size is 1. - "filter:N": the same as "max:N" but removes the s...
(UNLIKELY_OPENING_PUNCTUATION)
⏰ Context from checks skipped due to timeout of 90000ms (26)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Test C++ (false)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Test C++ (true)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Analyze (c-cpp)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
doc/train/training-advanced.md (1)
109-111: Clarify Batch Size Option DescriptionsThe descriptions for the
"auto:N","max:N", and"filter:N"options are clear and align well with the PR objectives. A couple of nitpicky suggestions to improve clarity and consistency:
- For
"max:N", consider adding a comma to separate the clause—for example:
Before:
"max:N": automatically determines the batch size so that ... is no more thanN. The minimum batch size is 1.
After:
"max:N": automatically determines the batch size so that ... is no more thanN, with a minimum batch size of 1.- For
"filter:N", you might rephrase slightly for consistency. For example:
Before:
"filter:N": the same as"max:N"but removes the systems with the number of atoms larger thanNfrom the data set. Throws an error if no system is left in a dataset.
After:
"filter:N": similar to"max:N", but it also removes systems whose atom count exceedsNfrom the dataset, throwing an error if no systems remain.These adjustments address the minor loose punctuation issues flagged by static analysis.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~110-~110: Loose punctuation mark.
Context: ...m is no less thanN. -"max:N": automatically determines the batch size...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~111-~111: Loose punctuation mark.
Context: ...inimum batch size is 1. -"filter:N": the same as"max:N"but removes the s...(UNLIKELY_OPENING_PUNCTUATION)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
doc/train/training-advanced.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
doc/train/training-advanced.md
[uncategorized] ~110-~110: Loose punctuation mark.
Context: ...m is no less than N. - "max:N": automatically determines the batch size...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~111-~111: Loose punctuation mark.
Context: ...inimum batch size is 1. - "filter:N": the same as "max:N" but removes the s...
(UNLIKELY_OPENING_PUNCTUATION)
⏰ Context from checks skipped due to timeout of 90000ms (25)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Test C++ (false)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Test C++ (true)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Analyze (c-cpp)
njzjz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update here
deepmd-kit/deepmd/utils/argcheck.py
Lines 2828 to 2834 in bb06e4f
| doc_batch_size = f'This key can be \n\n\ | |
| - list: the length of which is the same as the {link_sys}. The batch size of each system is given by the elements of the list.\n\n\ | |
| - int: all {link_sys} use the same batch size.\n\n\ | |
| - string "auto": automatically determines the batch size so that the batch_size times the number of atoms in the system is no less than 32.\n\n\ | |
| - string "auto:N": automatically determines the batch size so that the batch_size times the number of atoms in the system is no less than N.\n\n\ | |
| - string "mixed:N": the batch data will be sampled from all systems and merged into a mixed system with the batch size N. Only support the se_atten descriptor for TensorFlow backend.\n\n\ | |
| If MPI is used, the value should be considered as the batch size per task.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
deepmd/pd/utils/dataloader.py (1)
127-164: Consider adding code comments explaining the ceiling behavior.While the implementation of the
ceilingvariable and its usage is correct, adding a brief comment explaining the difference betweenceiling=True(auto) andceiling=False(max/filter) would improve code maintainability. This would help future developers understand the impact of this variable on batch size calculations.if batch_size == "auto": rule = 32 ceiling = True + # When ceiling=True, we ensure rule/natoms rounds up to fit exactly rule atoms elif batch_size.startswith("auto:"): rule = int(batch_size.split(":")[1]) ceiling = True elif batch_size.startswith("max:"): rule = int(batch_size.split(":")[1]) ceiling = False + # When ceiling=False, we don't round up and use at most rule atoms per batch
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
deepmd/pd/utils/dataloader.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (30)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Analyze (python)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Test C++ (false)
- GitHub Check: Test C++ (true)
🔇 Additional comments (7)
deepmd/pd/utils/dataloader.py (7)
129-132: Implementation of ceiling variable looks good.The code now explicitly sets
ceiling = Truefor bothautoandauto:Nbatch size rules, which helps clarify the behavior and sets the foundation for the new methods.
133-136: New max method implementation is well-designed.The
max:prefix implementation correctly setsceiling = Falseto prevent applying a ceiling to the batch size, which aligns with the PR objective of not increasing batch sizes to prevent OOM issues.
136-153: Filter implementation provides good safeguards.The implementation for the
filter:method is thorough and includes:
- Clear system filtering based on atom count
- Appropriate warnings when systems are removed
- Error handling when no systems remain after filtering
- Setting
ceiling = Falseconsistent with themax:methodThis will effectively prevent OOM issues by removing systems that exceed the atom count threshold.
154-154: Improved error message for unsupported batch size rules.The error message now clearly indicates what type of batch size rule is unsupported, which will help users troubleshoot configuration issues.
158-164: Batch size calculation logic handles both ceiling modes correctly.The conditional logic for calculating the batch size index (
bsi) now properly handles both ceiling behaviors:
- When
ceiling = True: Incrementsbsiif there's a remainder (original behavior)- When
ceiling = False: Ensuresbsiis at least 1 to handle cases where rule < natomsThis implementation aligns with the intended behavior differences between
autoand the newmax/filtermethods.
136-153:Details
✅ Verification successful
Verify filter behavior with unit tests.
While the implementation for the filter method is robust, it would be valuable to ensure it's covered by unit tests that verify:
- Systems exceeding the atom count are properly filtered
- Warnings are generated when systems are removed
- Errors are raised when no systems remain
🏁 Script executed:
#!/bin/bash # Check if unit tests exist for the new filter functionality # Look for test files that might be testing the batch size functionality fd -t f "test.*dataloader" -E "*.pyc" # Search for tests specifically related to the new batch size methods rg -i "test.*batch.*size|test.*filter|test.*max.*batch" --type pyLength of output: 1423
Filter functionality verified via existing tests—please confirm complete edge-case coverage.
The current unit tests in the repository (e.g., in
•source/tests/tf/test_deepmd_data_sys.py(e.g.,test_batch_size_5andtest_batch_size_raise),
•source/tests/pt/test_dploaderset.py(where comments indicate testingbatch_sizevalues like"filter")
show that filtering behavior is exercised. These tests should verify that:
- Systems with atom counts exceeding the specified threshold are properly removed.
- A warning is logged when one or more systems are filtered out.
- A
ValueErroris raised when all systems are removed through filtering.Please confirm that each of these scenarios is covered by dedicated assertions—especially checking the warning logs when systems are removed. If any of these aspects aren’t thoroughly verified, additional test cases might be needed.
133-136:Details
❓ Verification inconclusive
Ensure compatibility with Paddle backend.
As mentioned in the PR objectives, there's a plan to migrate these changes to the Paddle backend. Since this file is already in the Paddle backend (
deepmd/pd/utils/dataloader.py), verify that equivalent changes are made to any similar files in other backends for consistency.
🏁 Script executed:
#!/bin/bash # Check if similar dataloader implementations exist that might need the same changes # Look for other dataloader.py files fd -t f "dataloader.py" # Specifically check for similar batching logic in other backends rg "batch_size.*startswith.*auto" --type pyLength of output: 287
Attention: Verify Cross-Backend Consistency for Batch Size Handling
The updated batching logic in
deepmd/pd/utils/dataloader.pynow processes"max:"and"filter:"prefixes as expected for the Paddle backend. However, our repository search shows that the PyTorch backend file (deepmd/pt/utils/dataloader.py) only contains logic for the"auto:"prefix. Please verify that:
- If the modifications for
"max:"and"filter:"are intended to be exclusive to the Paddle backend (in line with the migration objectives), then it’s acceptable that the PyTorch backend retains its current behavior.- If a unified behavior across backends is desired, similar changes should be applied to
deepmd/pt/utils/dataloader.pyto handle"max:"and"filter:"cases consistently.Ensure that the final design decision and documentation are updated accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
doc/train/training-advanced.md (2)
110-110: Review "max:N" Documentation Clarity
The new"max:N"option is well-described: it specifies that the batch size should be determined so that the product with the number of atoms is no more thanN, with a minimum batch size of 1 and clearly lists the supported backends. Consider revising the punctuation slightly (for example, inserting a comma or semicolon after the condition) to enhance readability.🧰 Tools
🪛 LanguageTool
[uncategorized] ~110-~110: Loose punctuation mark.
Context: ...m is no less thanN. -"max:N": automatically determines the batch size...(UNLIKELY_OPENING_PUNCTUATION)
111-111: Review "filter:N" Documentation Details
The"filter:N"option is clearly explained—it behaves like"max:N"but also filters out systems whose atom counts exceedN, raising an error if no system remains. For added clarity, you might consider specifying which exception is raised or elaborating on the error behavior.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
doc/train/training-advanced.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
doc/train/training-advanced.md
[uncategorized] ~110-~110: Loose punctuation mark.
Context: ...m is no less than N. - "max:N": automatically determines the batch size...
(UNLIKELY_OPENING_PUNCTUATION)
⏰ Context from checks skipped due to timeout of 90000ms (24)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Test C++ (false)
- GitHub Check: Test C++ (true)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
🔇 Additional comments (1)
doc/train/training-advanced.md (1)
109-109: Clarify "auto:N" Documentation
The updated description for"auto:N"clearly states that the batch size is determined such that the product of the batch size and the number of atoms in a system is no less thanN. This wording is consistent with previous documentation.
This PR adds two type of defining batch sizes: `max`: Unlike `auto`, `max` does not do ceiling to batch size. Consider the case: batch size is set as `auto:256`, and you've got a system with 255 atoms. In this case, `auto` will infer a batch size of 2, while `max` remains 1 to avoid potential OOM. `filter`: It removes any systems with atoms more than expected, and limits the total atom numbers in a batch like what `max` does. I've running out of idea what the keywords should be, so please leave a comment if there is a better idea. - [ ] Todo: migrate changes to Paddle backend <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - **New Features** - Enhanced batch size configuration options now allow the use of descriptive prefixes for more flexible control, including `"max:N"` and `"filter:N"`. - Clarified the existing `"auto:N"` option for batch size configuration. - **Bug Fixes** - Improved validation ensures that batch sizes are always at least one. - Updated error feedback provides clear messaging for unsupported batch size formats. - **Tests** - Introduced a new test suite to validate batch size behavior under various input scenarios, including assertions for different formats and conditions. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Chun Cai <amoycaic@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
This PR adds two type of defining batch sizes: `max`: Unlike `auto`, `max` does not do ceiling to batch size. Consider the case: batch size is set as `auto:256`, and you've got a system with 255 atoms. In this case, `auto` will infer a batch size of 2, while `max` remains 1 to avoid potential OOM. `filter`: It removes any systems with atoms more than expected, and limits the total atom numbers in a batch like what `max` does. I've running out of idea what the keywords should be, so please leave a comment if there is a better idea. - [ ] Todo: migrate changes to Paddle backend <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - **New Features** - Enhanced batch size configuration options now allow the use of descriptive prefixes for more flexible control, including `"max:N"` and `"filter:N"`. - Clarified the existing `"auto:N"` option for batch size configuration. - **Bug Fixes** - Improved validation ensures that batch sizes are always at least one. - Updated error feedback provides clear messaging for unsupported batch size formats. - **Tests** - Introduced a new test suite to validate batch size behavior under various input scenarios, including assertions for different formats and conditions. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Chun Cai <amoycaic@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
This PR adds two type of defining batch sizes:
max: Unlikeauto,maxdoes not do ceiling to batch size. Consider the case: batch size is set asauto:256, and you've got a system with 255 atoms. In this case,autowill infer a batch size of 2, whilemaxremains 1 to avoid potential OOM.filter: It removes any systems with atoms more than expected, and limits the total atom numbers in a batch like whatmaxdoes.I've running out of idea what the keywords should be, so please leave a comment if there is a better idea.
Summary by CodeRabbit
Summary by CodeRabbit
"max:N"and"filter:N"."auto:N"option for batch size configuration.