-
Notifications
You must be signed in to change notification settings - Fork 584
fix: optimize atom type mapping #5043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR reverts a previous optimization that used vectorized NumPy operations for remapping atom types to a simpler list comprehension approach.
Key Changes:
- Replaces vectorized NumPy array indexing with a list comprehension-based approach for remapping atom types when a global type_map is provided
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughReplace loop-based index remapping in non-mixed_type path with a vectorized name-based lookup: build a NumPy int32 array from Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
deepmd/utils/data.py (1)
107-108: Consider vectorizing with NumPy for efficiency.The list comprehensions work correctly but could be replaced with NumPy vectorization for better performance, though this is a one-time initialization operation.
Example vectorized approach:
atom_names = np.array(self.type_map)[self.atom_type] atom_type_ = np.array([global_type_dict[name] for name in atom_names])Or fully vectorized:
# Create a mapping array from local to global indices type_mapping = np.array([global_type_dict[name] for name in self.type_map]) self.atom_type = type_mapping[self.atom_type].astype(np.int32)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
deepmd/utils/data.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Always run
ruff check .andruff format .before committing changes to Python code
Files:
deepmd/utils/data.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
- GitHub Check: CodeQL analysis (python)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test C++ (true)
- GitHub Check: Test C++ (false)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Analyze (python)
- GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
🔇 Additional comments (1)
deepmd/utils/data.py (1)
106-109: Logic change correctly addresses the PR objective.The dictionary-based name lookup approach properly handles the case where the training system has fewer types than the global type_map. The validation at lines 100-104 ensures all local type names exist in the global type_map, making this implementation safe.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## devel #5043 +/- ##
==========================================
- Coverage 84.19% 84.19% -0.01%
==========================================
Files 709 709
Lines 70216 70213 -3
Branches 3621 3620 -1
==========================================
- Hits 59116 59113 -3
Misses 9933 9933
Partials 1167 1167 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
deepmd/utils/data.py (1)
106-109: Excellent performance improvement with vectorized remapping.The vectorized approach correctly builds a mapping array and applies it using NumPy indexing, which is significantly faster than the previous loop-based implementation for systems with many atoms. This addresses the performance concern raised in the earlier review.
Optional refinement: The
.astype(np.int32)at the end of line 109 is redundant—indexingold_to_new_type_idx(which is alreadydtype=np.int32) returns anint32array, so the explicit cast is unnecessary.Optional: Remove redundant type cast
old_to_new_type_idx = np.array( [type_map.index(name) for name in self.type_map], dtype=np.int32 ) - self.atom_type = old_to_new_type_idx[self.atom_type].astype(np.int32) + self.atom_type = old_to_new_type_idx[self.atom_type]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
deepmd/utils/data.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Always run
ruff check .andruff format .before committing changes to Python code
Files:
deepmd/utils/data.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Analyze (python)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test C++ (false)
- GitHub Check: Test C++ (true)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
Bug fix when type num in training system is less than type map. Fix issue deepmodeling#5042 @ <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Refactor** * Optimized internal atom-type mapping to make data import and processing more efficient. * Maintains identical behavior and output formats; no changes to public interfaces or exported declarations. * Improves performance and observability during data handling without altering results or user-facing functionality. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Bug fix when type num in training system is less than type map. Fix issue deepmodeling#5042 @ <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Refactor** * Optimized internal atom-type mapping to make data import and processing more efficient. * Maintains identical behavior and output formats; no changes to public interfaces or exported declarations. * Improves performance and observability during data handling without altering results or user-facing functionality. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Bug fix when type num in training system is less than type map. Fix issue #5042 @
Summary by CodeRabbit