-
Notifications
You must be signed in to change notification settings - Fork 584
Fix: support "max:N" and "filter:N" batch_size rules in DeepmdDataSystem #4876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
for more information, see https://pre-commit.ci
📝 WalkthroughWalkthroughThe batch size parsing logic in the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant DeepmdDataSystem
User->>DeepmdDataSystem: Initialize with batch_size rule ("max" or "filter")
DeepmdDataSystem->>DeepmdDataSystem: Parse batch_size string
alt "max" rule
DeepmdDataSystem->>DeepmdDataSystem: For each system, compute batch_size = max(1, rule_value // natoms)
else "filter" rule
DeepmdDataSystem->>DeepmdDataSystem: Remove systems with natoms > rule_value
DeepmdDataSystem->>DeepmdDataSystem: If no systems left, raise error
DeepmdDataSystem->>DeepmdDataSystem: For each remaining system, compute batch_size = max(1, rule_value // natoms)
end
DeepmdDataSystem-->>User: Data system initialized with new batch size logic
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🧰 Additional context used🪛 Ruff (0.12.2)deepmd/utils/data_system.py155-155: Yoda condition detected Rewrite as (SIM300) 168-168: Yoda condition detected Rewrite as (SIM300) 186-186: No explicit Set (B028) ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
deepmd/utils/data_system.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
deepmd/utils/data_system.py
155-155: Yoda condition detected
Rewrite as words[0] == "max"
(SIM300)
168-168: Yoda condition detected
Rewrite as words[0] == "filter"
(SIM300)
182-182: No explicit stacklevel keyword argument found
Set stacklevel=2
(B028)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Test C++ (true)
- GitHub Check: Test C++ (false)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Test Python (2, 3.12)
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## devel #4876 +/- ##
==========================================
- Coverage 84.34% 84.30% -0.04%
==========================================
Files 702 702
Lines 68585 68621 +36
Branches 3573 3572 -1
==========================================
+ Hits 57847 57851 +4
- Misses 9597 9631 +34
+ Partials 1141 1139 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…tem (deepmodeling#4876) # Fix: support "max:N" and "filter:N" batch_size rules in DeepmdDataSystem ## Problem - Using `batch_size: "max:..."` or `"filter:..."` in configs caused: - `RuntimeError: unknown batch_size rule max` during the PyTorch path (neighbor statistics). - Docs mention these rules and PyTorch `DpLoaderSet` already supports them, so behavior was inconsistent across layers. ## Cause - The common data layer `DeepmdDataSystem` only implemented `"auto"` and `"mixed"` for string `batch_size`, missing `"max"` and `"filter"`. - PT training performs neighbor statistics via `DeepmdDataSystem` before real training, so it failed early when those rules were used. ## Fix - Implement `"max:N"` and `"filter:N"` in `DeepmdDataSystem.__init__` to mirror `DpLoaderSet` semantics: - `max:N`: per-system `batch_size = max(1, N // natoms)` so `batch_size * natoms <= N`. - `filter:N`: drop systems with `natoms > N` (warn if any removed; error if none left), then set per-system `batch_size` as in `max:N`. - After filtering, update `self.data_systems`, `self.system_dirs`, and `self.nsystems` before computing other metadata. ## Impact - Aligns the common layer behavior with PyTorch `DpLoaderSet` and with the documentation. - Prevents PT neighbor-stat crashes with configs using `"max"`/`"filter"`. ## Compatibility - No change to numeric `batch_size` or existing `"auto"/"auto:N"/"mixed:N"` rules. - TF/PT/PD paths now accept the same `batch_size` rules consistently in the common layer. ## Files Changed - `deepmd/utils/data_system.py`: add parsing branches for `"max:N"` and `"filter:N"` in `DeepmdDataSystem.__init__`. ```python elif "max" == words[0]: # Determine batch size so that batch_size * natoms <= rule, at least 1 if len(words) != 2: raise RuntimeError("batch size must be specified for max systems") rule = int(words[1]) bs = [] for ii in self.data_systems: ni = ii.get_natoms() bsi = rule // ni if bsi == 0: bsi = 1 bs.append(bsi) self.batch_size = bs elif "filter" == words[0]: # Remove systems with natoms > rule, then set batch size like "max:rule" if len(words) != 2: raise RuntimeError("batch size must be specified for filter systems") rule = int(words[1]) filtered_data_systems = [] filtered_system_dirs = [] for sys_dir, data_sys in zip(self.system_dirs, self.data_systems): if data_sys.get_natoms() <= rule: filtered_data_systems.append(data_sys) filtered_system_dirs.append(sys_dir) if len(filtered_data_systems) == 0: raise RuntimeError(f"No system left after removing systems with more than {rule} atoms") if len(filtered_data_systems) != len(self.data_systems): warnings.warn(f"Remove {len(self.data_systems) - len(filtered_data_systems)} systems with more than {rule} atoms") self.data_systems = filtered_data_systems self.system_dirs = filtered_system_dirs self.nsystems = len(self.data_systems) bs = [] for ii in self.data_systems: ni = ii.get_natoms() bsi = rule // ni if bsi == 0: bsi = 1 bs.append(bsi) self.batch_size = bs ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added support for "max" and "filter" batch size rules, allowing more flexible control over batch sizing and filtering of data systems based on atom counts. * **Bug Fixes** * Improved error handling for incorrect batch size string formats and cases where no systems remain after filtering. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Fix: support "max:N" and "filter:N" batch_size rules in DeepmdDataSystem
Problem
batch_size: "max:..."or"filter:..."in configs caused:RuntimeError: unknown batch_size rule maxduring the PyTorch path (neighbor statistics).DpLoaderSetalready supports them, so behavior was inconsistent across layers.Cause
DeepmdDataSystemonly implemented"auto"and"mixed"for stringbatch_size, missing"max"and"filter".DeepmdDataSystembefore real training, so it failed early when those rules were used.Fix
"max:N"and"filter:N"inDeepmdDataSystem.__init__to mirrorDpLoaderSetsemantics:max:N: per-systembatch_size = max(1, N // natoms)sobatch_size * natoms <= N.filter:N: drop systems withnatoms > N(warn if any removed; error if none left), then set per-systembatch_sizeas inmax:N.self.data_systems,self.system_dirs, andself.nsystemsbefore computing other metadata.Impact
DpLoaderSetand with the documentation."max"/"filter".Compatibility
batch_sizeor existing"auto"/"auto:N"/"mixed:N"rules.batch_sizerules consistently in the common layer.Files Changed
deepmd/utils/data_system.py: add parsing branches for"max:N"and"filter:N"inDeepmdDataSystem.__init__.Summary by CodeRabbit