fix: optimize atom type mapping #5043

OutisLi · 2025-11-10T07:13:28Z

Bug fix when type num in training system is less than type map. Fix issue #5042 @

Summary by CodeRabbit

Refactor
- Optimized internal atom-type mapping to make data import and processing more efficient.
- Maintains identical behavior and output formats; no changes to public interfaces or exported declarations.
- Improves performance and observability during data handling without altering results or user-facing functionality.

Copilot

Pull Request Overview

This PR reverts a previous optimization that used vectorized NumPy operations for remapping atom types to a simpler list comprehension approach.

Key Changes:

Replaces vectorized NumPy array indexing with a list comprehension-based approach for remapping atom types when a global type_map is provided

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

deepmd/utils/data.py

coderabbitai · 2025-11-10T07:16:33Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Replace loop-based index remapping in non-mixed_type path with a vectorized name-based lookup: build a NumPy int32 array from [type_map.index(name) for name in self.type_map] and remap self.atom_type via that array. Behavior and self.atom_type dtype remain the same; mixed_type path unchanged.

Changes

Cohort / File(s)	Summary
Type mapping refactor `deepmd/utils/data.py`	Replace per-element index-based mapping with a vectorized name-based lookup: construct a NumPy `int32` array from `[type_map.index(name) for name in self.type_map]` and use it to remap `self.atom_type`. No changes to `mixed_type` branch or downstream control flow.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Single-file, localized refactor with functional equivalence.
Review attention:
- Verify the constructed NumPy array correctly reflects indices for all entries in self.type_map.
- Confirm self.atom_type dtype remains int32 and mapping preserves values for edge cases (e.g., duplicate names).

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: optimize atom type mapping' accurately reflects the main change: a vectorized optimization of atom type mapping logic in deepmd/utils/data.py.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

deepmd/utils/data.py (1)
107-108: Consider vectorizing with NumPy for efficiency.

The list comprehensions work correctly but could be replaced with NumPy vectorization for better performance, though this is a one-time initialization operation.

Example vectorized approach:
atom_names = np.array(self.type_map)[self.atom_type]
atom_type_ = np.array([global_type_dict[name] for name in atom_names])
Or fully vectorized:
# Create a mapping array from local to global indices
type_mapping = np.array([global_type_dict[name] for name in self.type_map])
self.atom_type = type_mapping[self.atom_type].astype(np.int32)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7f25e16 and 96da487.

📒 Files selected for processing (1)

deepmd/utils/data.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Always run ruff check . and ruff format . before committing changes to Python code

Files:

deepmd/utils/data.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)

GitHub Check: CodeQL analysis (python)
GitHub Check: Test Python (6, 3.9)
GitHub Check: Test Python (5, 3.12)
GitHub Check: Test Python (5, 3.9)
GitHub Check: Test Python (6, 3.12)
GitHub Check: Test Python (3, 3.9)
GitHub Check: Test Python (1, 3.12)
GitHub Check: Test Python (4, 3.9)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Test Python (2, 3.12)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Test Python (3, 3.12)
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Test C++ (true)
GitHub Check: Test C++ (false)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Analyze (c-cpp)
GitHub Check: Analyze (python)
GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)

🔇 Additional comments (1)

deepmd/utils/data.py (1)

106-109: Logic change correctly addresses the PR objective.

The dictionary-based name lookup approach properly handles the case where the training system has fewer types than the global type_map. The validation at lines 100-104 ensures all local type names exist in the global type_map, making this implementation safe.

deepmd/utils/data.py

codecov · 2025-11-10T08:17:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.19%. Comparing base (7f25e16) to head (ff8aee9).
⚠️ Report is 1 commits behind head on devel.

Additional details and impacted files

@@            Coverage Diff             @@
##            devel    #5043      +/-   ##
==========================================
- Coverage   84.19%   84.19%   -0.01%     
==========================================
  Files         709      709              
  Lines       70216    70213       -3     
  Branches     3621     3620       -1     
==========================================
- Hits        59116    59113       -3     
  Misses       9933     9933              
  Partials     1167     1167

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

deepmd/utils/data.py (1)
106-109: Excellent performance improvement with vectorized remapping.

The vectorized approach correctly builds a mapping array and applies it using NumPy indexing, which is significantly faster than the previous loop-based implementation for systems with many atoms. This addresses the performance concern raised in the earlier review.

Optional refinement: The .astype(np.int32) at the end of line 109 is redundant—indexing old_to_new_type_idx (which is already dtype=np.int32) returns an int32 array, so the explicit cast is unnecessary.
Optional: Remove redundant type cast
                old_to_new_type_idx = np.array(
                    [type_map.index(name) for name in self.type_map], dtype=np.int32
                )
-                self.atom_type = old_to_new_type_idx[self.atom_type].astype(np.int32)
+                self.atom_type = old_to_new_type_idx[self.atom_type]

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 96da487 and ff8aee9.

📒 Files selected for processing (1)

deepmd/utils/data.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Always run ruff check . and ruff format . before committing changes to Python code

Files:

deepmd/utils/data.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)

GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Analyze (c-cpp)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Analyze (python)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Test Python (5, 3.9)
GitHub Check: Test Python (6, 3.12)
GitHub Check: Test Python (2, 3.12)
GitHub Check: Test Python (5, 3.12)
GitHub Check: Test C++ (false)
GitHub Check: Test C++ (true)
GitHub Check: Test Python (6, 3.9)
GitHub Check: Test Python (4, 3.9)
GitHub Check: Test Python (3, 3.9)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Test Python (3, 3.12)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Test Python (1, 3.12)
GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)

Bug fix when type num in training system is less than type map. Fix issue deepmodeling#5042 @  ## Summary by CodeRabbit * **Refactor** * Optimized internal atom-type mapping to make data import and processing more efficient. * Maintains identical behavior and output formats; no changes to public interfaces or exported declarations. * Improves performance and observability during data handling without altering results or user-facing functionality.

fix: optimize atom type mapping

96da487

Copilot AI review requested due to automatic review settings November 10, 2025 07:13

github-actions bot added the Python label Nov 10, 2025

Copilot AI reviewed Nov 10, 2025

View reviewed changes

deepmd/utils/data.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Nov 10, 2025

View reviewed changes

deepmd/utils/data.py Outdated Show resolved Hide resolved

vectorize

ff8aee9

coderabbitai bot reviewed Nov 10, 2025

View reviewed changes

OutisLi mentioned this pull request Nov 10, 2025

[BUG] Dataset loading IndexError raised in new devel branch #5042

Closed

iProzd approved these changes Nov 11, 2025

View reviewed changes

njzjz linked an issue Nov 11, 2025 that may be closed by this pull request

[BUG] Dataset loading IndexError raised in new devel branch #5042

Closed

njzjz added this pull request to the merge queue Nov 11, 2025

Merged via the queue into deepmodeling:devel with commit 4ec0437 Nov 11, 2025
60 checks passed

OutisLi deleted the pr/loadfix branch November 13, 2025 02:07

coderabbitai bot mentioned this pull request Nov 13, 2025

test(common): add regression for atom type remap #5050

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: optimize atom type mapping #5043

fix: optimize atom type mapping #5043

Uh oh!

OutisLi commented Nov 10, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

coderabbitai bot commented Nov 10, 2025 •

edited

Loading

Other AI code review bot(s) detected

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

codecov bot commented Nov 10, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: optimize atom type mapping #5043

fix: optimize atom type mapping #5043

Uh oh!

Conversation

OutisLi commented Nov 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

coderabbitai bot commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

OutisLi commented Nov 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 10, 2025 •

edited

Loading

codecov bot commented Nov 10, 2025 •

edited

Loading