-
Notifications
You must be signed in to change notification settings - Fork 584
feat(array-api): env mat stat #4729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn>
| return CommonTest.skip_pt | ||
|
|
||
| @property | ||
| def skip_dp(self) -> bool: |
Check notice
Code scanning / CodeQL
Unused local variable Note test
|
|
||
| @property | ||
| def skip_dp(self) -> bool: | ||
| ( |
Check notice
Code scanning / CodeQL
Unused local variable Note test
| @property | ||
| def skip_dp(self) -> bool: | ||
| ( | ||
| resnet_dt, |
Check notice
Code scanning / CodeQL
Unused local variable Note test
| def skip_dp(self) -> bool: | ||
| ( | ||
| resnet_dt, | ||
| type_one_side, |
Check notice
Code scanning / CodeQL
Unused local variable Note test
| ) = self.param | ||
| return not INSTALLED_PD | ||
|
|
||
| @property |
Check notice
Code scanning / CodeQL
Unused local variable Note test
| return not INSTALLED_PD | ||
|
|
||
| @property | ||
| def skip_array_api_strict(self) -> bool: |
Check notice
Code scanning / CodeQL
Unused local variable Note test
|
|
||
| @property | ||
| def skip_array_api_strict(self) -> bool: | ||
| ( |
Check notice
Code scanning / CodeQL
Unused local variable Note test
| @property | ||
| def skip_array_api_strict(self) -> bool: | ||
| ( | ||
| resnet_dt, |
Check notice
Code scanning / CodeQL
Unused local variable Note test
| def skip_array_api_strict(self) -> bool: | ||
| ( | ||
| resnet_dt, | ||
| type_one_side, |
Check notice
Code scanning / CodeQL
Unused local variable Note test
📝 WalkthroughWalkthroughPreviously unimplemented methods for computing and retrieving descriptor input statistics ( Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Descriptor
participant EnvMatStatSe
User->>Descriptor: compute_input_stats(merged, path)
alt path is directory with cached stats
Descriptor->>EnvMatStatSe: load stats from path
else
alt merged is callable
Descriptor->>User: merged()
User-->>Descriptor: list[dict]
end
Descriptor->>EnvMatStatSe: compute stats from data
end
EnvMatStatSe-->>Descriptor: stats (mean, std)
Descriptor->>Descriptor: set davg, dstd, stats
User->>Descriptor: get_stats()
Descriptor-->>User: stats dict
Possibly related PRs
Suggested labels
Suggested reviewers
Tip ⚡️ Faster reviews with caching
Enjoy the performance boost—your workflow just got faster. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (29)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (10)
deepmd/dpmodel/descriptor/dpa3.py (1)
446-450: Implemented compute_input_stats method for descriptor statistics.The implementation now properly delegates to the internal
repflowsdescriptor component to compute input statistics, allowing normalization of descriptors. This matches the pattern used in other descriptor classes likeDescrptDPA1andDescrptDPA2.Consider renaming the unused loop variable
iito_or_iito indicate it's not being used within the loop body:- for ii, descrpt in enumerate(descrpt_list): + for _, descrpt in enumerate(descrpt_list):🧰 Tools
🪛 Ruff (0.8.2)
449-449: Loop control variable
iinot used within loop bodyRename unused
iito_ii(B007)
deepmd/dpmodel/descriptor/dpa2.py (1)
740-765: Implemented compute_input_stats method to calculate input statistics.The implementation correctly handles both direct data and lazy loading via callable, with proper path handling for caching. The method efficiently delegates statistics computation to each of the descriptor components.
However, there's one minor improvement to consider:
- for ii, descrpt in enumerate(descrpt_list): + for _ii, descrpt in enumerate(descrpt_list):or
- for ii, descrpt in enumerate(descrpt_list): + for _, descrpt in enumerate(descrpt_list):Since the index variable is not used in the loop body, using an underscore prefix or just underscore is more idiomatic.
🧰 Tools
🪛 Ruff (0.8.2)
764-764: Loop control variable
iinot used within loop bodyRename unused
iito_ii(B007)
deepmd/dpmodel/descriptor/repflows.py (1)
353-392: Implemented compute_input_stats method for descriptor statistics.The implementation properly computes or loads statistical data for the descriptor inputs. The code correctly handles the path for caching, lazy loading of data, and properly updates the descriptor's mean and standard deviation values.
A minor style improvement could be made:
- if callable(merged): - # only get data for once - sampled = merged() - else: - sampled = merged + sampled = merged() if callable(merged) else mergedThis makes the code more concise while maintaining readability.
🧰 Tools
🪛 Ruff (0.8.2)
378-382: Use ternary operator
sampled = merged() if callable(merged) else mergedinstead ofif-else-block(SIM108)
deepmd/dpmodel/descriptor/se_r.py (1)
270-308: Implemented compute_input_stats method for radial descriptor statistics.The implementation properly handles statistics computation for the SE_R descriptor, with correct path handling for caching, support for lazy data loading, and appropriate updating of mean and standard deviation values.
A minor style improvement could be made:
- if callable(merged): - # only get data for once - sampled = merged() - else: - sampled = merged + sampled = merged() if callable(merged) else mergedThis makes the code more concise while maintaining readability.
🧰 Tools
🪛 Ruff (0.8.2)
295-299: Use ternary operator
sampled = merged() if callable(merged) else mergedinstead ofif-else-block(SIM108)
deepmd/dpmodel/descriptor/se_t.py (1)
250-288: Implemented compute_input_stats method for angular descriptor statistics.The implementation properly handles statistics computation for the SE_T descriptor, with correct path handling for caching, support for lazy data loading, and appropriate updating of mean and standard deviation values. The implementation is consistent with other descriptor classes, maintaining good code uniformity.
A minor style improvement could be made:
- if callable(merged): - # only get data for once - sampled = merged() - else: - sampled = merged + sampled = merged() if callable(merged) else mergedThis makes the code more concise while maintaining readability.
🧰 Tools
🪛 Ruff (0.8.2)
275-279: Use ternary operator
sampled = merged() if callable(merged) else mergedinstead ofif-else-block(SIM108)
deepmd/dpmodel/descriptor/repformers.py (1)
374-413: Well-implemented statistics computation with proper caching support.The
compute_input_statsmethod follows a good pattern, handling both lazy data sampling and statistics caching. It correctly manages path-based caching and updating descriptor statistics with appropriate type conversion.Consider using a ternary operator for better readability:
- if callable(merged): - # only get data for once - sampled = merged() - else: - sampled = merged + sampled = merged() if callable(merged) else merged🧰 Tools
🪛 Ruff (0.8.2)
399-403: Use ternary operator
sampled = merged() if callable(merged) else mergedinstead ofif-else-block(SIM108)
deepmd/dpmodel/descriptor/se_e2_a.py (1)
311-349: Well-implemented statistics computation.The
compute_input_statsmethod is well-structured, supporting both lazy data loading and file-based caching via theEnvMatStatSeutility.Consider using a ternary operator for better readability:
- if callable(merged): - # only get data for once - sampled = merged() - else: - sampled = merged + sampled = merged() if callable(merged) else merged🧰 Tools
🪛 Ruff (0.8.2)
336-340: Use ternary operator
sampled = merged() if callable(merged) else mergedinstead ofif-else-block(SIM108)
deepmd/dpmodel/descriptor/dpa1.py (1)
864-903: Well-implemented statistics computation in DescrptBlockSeAtten.The statistics computation is implemented with proper support for lazy data loading and caching, consistent with other descriptor implementations.
Consider using a ternary operator for better readability:
- if callable(merged): - # only get data for once - sampled = merged() - else: - sampled = merged + sampled = merged() if callable(merged) else merged🧰 Tools
🪛 Ruff (0.8.2)
889-893: Use ternary operator
sampled = merged() if callable(merged) else mergedinstead ofif-else-block(SIM108)
deepmd/dpmodel/utils/env_mat_stat.py (1)
122-129: Remove unused localnatomsto avoid dead‐code noise
natomsis extracted fromsystembut never referenced afterwards. Keeping unused bindings clutters the scope and may mislead future readers.- coord, atype, box, natoms = ( - system["coord"], - system["atype"], - system["box"], - system["natoms"], - ) + coord, atype, box = ( + system["coord"], + system["atype"], + system["box"], + )🧰 Tools
🪛 Ruff (0.8.2)
122-122: Local variable
natomsis assigned to but never usedRemove assignment to unused variable
natoms(F841)
deepmd/dpmodel/descriptor/se_t_tebd.py (1)
675-679: Tiny style-only simplification opportunityThe
if callable(merged)block can be collapsed using a ternary expression, as the Ruff hint suggests, improving brevity without hurting clarity:- if callable(merged): - # only get data for once - sampled = merged() - else: - sampled = merged + sampled = merged() if callable(merged) else merged🧰 Tools
🪛 Ruff (0.8.2)
675-679: Use ternary operator
sampled = merged() if callable(merged) else mergedinstead ofif-else-block(SIM108)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
deepmd/dpmodel/descriptor/dpa1.py(4 hunks)deepmd/dpmodel/descriptor/dpa2.py(2 hunks)deepmd/dpmodel/descriptor/dpa3.py(1 hunks)deepmd/dpmodel/descriptor/repflows.py(3 hunks)deepmd/dpmodel/descriptor/repformers.py(3 hunks)deepmd/dpmodel/descriptor/se_e2_a.py(4 hunks)deepmd/dpmodel/descriptor/se_r.py(4 hunks)deepmd/dpmodel/descriptor/se_t.py(4 hunks)deepmd/dpmodel/descriptor/se_t_tebd.py(4 hunks)deepmd/dpmodel/utils/env_mat_stat.py(1 hunks)deepmd/dpmodel/utils/nlist.py(2 hunks)deepmd/dpmodel/utils/region.py(1 hunks)source/tests/consistent/common.py(2 hunks)source/tests/consistent/descriptor/test_se_e2_a.py(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (6)
deepmd/dpmodel/utils/nlist.py (1)
deepmd/dpmodel/utils/region.py (1)
normalize_coord(53-75)
deepmd/dpmodel/descriptor/dpa3.py (4)
deepmd/dpmodel/descriptor/dpa2.py (1)
compute_input_stats(740-765)deepmd/dpmodel/descriptor/repflows.py (1)
compute_input_stats(353-391)deepmd/pt/model/descriptor/descriptor.py (1)
compute_input_stats(102-123)deepmd/utils/path.py (1)
DPPath(27-157)
deepmd/dpmodel/descriptor/dpa2.py (3)
deepmd/utils/path.py (1)
DPPath(27-157)deepmd/pt/model/descriptor/descriptor.py (1)
compute_input_stats(102-123)deepmd/pt/model/descriptor/dpa2.py (1)
compute_input_stats(482-507)
deepmd/dpmodel/descriptor/dpa1.py (8)
deepmd/dpmodel/utils/env_mat_stat.py (2)
EnvMatStatSe(65-253)get_hash(200-219)deepmd/utils/env_mat_stat.py (2)
StatItem(26-91)load_or_compute_stats(162-181)deepmd/dpmodel/descriptor/se_r.py (1)
compute_input_stats(270-308)deepmd/dpmodel/descriptor/se_t.py (1)
compute_input_stats(250-288)deepmd/dpmodel/descriptor/repformers.py (2)
compute_input_stats(374-412)get_stats(414-420)deepmd/pt/model/descriptor/dpa1.py (1)
compute_input_stats(412-433)deepmd/pt/model/descriptor/se_atten.py (2)
compute_input_stats(361-402)get_stats(404-410)deepmd/dpmodel/descriptor/repflows.py (1)
get_stats(393-399)
deepmd/dpmodel/descriptor/se_r.py (2)
deepmd/dpmodel/utils/env_mat_stat.py (2)
EnvMatStatSe(65-253)get_hash(200-219)deepmd/utils/env_mat_stat.py (1)
load_or_compute_stats(162-181)
source/tests/consistent/descriptor/test_se_e2_a.py (2)
source/tests/consistent/descriptor/common.py (7)
DescriptorTest(57-198)build_tf_descriptor(60-83)eval_dp_descriptor(85-102)eval_pt_descriptor(104-124)eval_jax_descriptor(126-146)eval_pd_descriptor(148-168)eval_array_api_strict_descriptor(170-198)deepmd/utils/argcheck.py (1)
descrpt_se_a_args(271-335)
🪛 Ruff (0.8.2)
deepmd/dpmodel/descriptor/repflows.py
378-382: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block
(SIM108)
deepmd/dpmodel/descriptor/dpa3.py
449-449: Loop control variable ii not used within loop body
Rename unused ii to _ii
(B007)
deepmd/dpmodel/descriptor/repformers.py
399-403: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block
(SIM108)
deepmd/dpmodel/descriptor/se_t.py
275-279: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block
(SIM108)
deepmd/dpmodel/descriptor/dpa2.py
764-764: Loop control variable ii not used within loop body
Rename unused ii to _ii
(B007)
deepmd/dpmodel/descriptor/se_e2_a.py
336-340: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block
(SIM108)
deepmd/dpmodel/descriptor/dpa1.py
889-893: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block
(SIM108)
deepmd/dpmodel/descriptor/se_r.py
295-299: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block
(SIM108)
deepmd/dpmodel/utils/env_mat_stat.py
122-122: Local variable natoms is assigned to but never used
Remove assignment to unused variable natoms
(F841)
deepmd/dpmodel/descriptor/se_t_tebd.py
675-679: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block
(SIM108)
⏰ Context from checks skipped due to timeout of 90000ms (29)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Analyze (python)
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Test C++ (false)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Test C++ (true)
- GitHub Check: Analyze (c-cpp)
🔇 Additional comments (21)
deepmd/dpmodel/utils/region.py (1)
74-74: Improved array API compatibility for remainder operation.The change from
1.0toxp.asarray(1.0)ensures proper type consistency when using array API compatible libraries. This is important because when operations are performed between arrays and scalar literals, some array API implementations might throw errors if the types aren't from the same namespace.deepmd/dpmodel/utils/nlist.py (3)
28-28: Good addition for array API compatibility.Adding the array namespace extraction at the beginning of the function ensures consistent array operations throughout the function body, allowing the code to work with different array backends.
32-34: Properly using array namespace for reshaping operations.Converting the NumPy-specific reshaping to use the array API compatible
xp.reshapeensures consistency with the chosen array backend. This change is crucial for supporting multiple backends like JAX, PyTorch, or NumPy.
48-48: Consistent use of array namespace for reshaping.Using
xp.reshapeinstead of direct reshaping maintains backend consistency throughout the function. This is a necessary change to ensure the function works correctly across different array libraries.source/tests/consistent/common.py (2)
438-440: Smart test enhancement to ignore variable statistics.Removing
@variablesfrom the serialized data comparison is a good approach as these statistics (like mean and standard deviation) might legitimately differ between backends while still producing functionally equivalent models. This change prevents false test failures when only the statistics differ.
475-477: Consistent handling of variables in JAX backend tests.Similar to the PyTorch test modifications, excluding
@variablesfrom the serialized data comparison in JAX tests ensures consistent test behavior across different backend implementations.deepmd/dpmodel/descriptor/dpa2.py (1)
3-3: AddedCallableto imports for compute_input_stats implementation.Good addition of the required typing import for the newly implemented
compute_input_statsmethod.deepmd/dpmodel/descriptor/repflows.py (3)
25-31: Added EnvMatStatSe import for statistics computation.Good addition of the required import for environment matrix statistics calculation.
38-40: Added StatItem import for statistics data typing.Appropriate import for handling typed statistical data.
393-399: Added get_stats method for retrieving computed statistics.Good implementation of the
get_statsmethod with proper error handling when statistics haven't been computed yet.deepmd/dpmodel/descriptor/se_r.py (3)
3-4: AddedCallableto imports for compute_input_stats implementation.Good addition of the required typing import for the new method.
29-31: Added EnvMatStatSe import for statistics computation.Appropriate import for the environment matrix statistics functionality.
181-181: Added ndescrpt attribute to track descriptor dimension.Good addition of the
ndescrptattribute which properly matches the descriptor dimension for se_r.deepmd/dpmodel/descriptor/se_t.py (3)
4-4: AddedCallableto imports for compute_input_stats implementation.Good addition of the required typing import for the new method.
29-31: Added EnvMatStatSe import for statistics computation.Appropriate import for the environment matrix statistics functionality.
161-161: Added ndescrpt attribute with correct dimension calculation.Good addition of the
ndescrptattribute which properly sets the descriptor dimension tonnei * 4for the se_t descriptor, accounting for the 4 components per neighbor.deepmd/dpmodel/descriptor/repformers.py (1)
414-420: Good error handling for stats retrieval.The
get_statsmethod appropriately checks if statistics have been computed and raises a clear error message if not.deepmd/dpmodel/descriptor/se_e2_a.py (1)
222-222: Good addition of descriptor dimension.The
ndescrptproperty is properly initialized as the product of neighbors and dimensions per neighbor, which is needed for statistics calculation.deepmd/dpmodel/descriptor/dpa1.py (2)
416-437: Good delegation to underlying components.The
compute_input_statsmethod inDescrptDPA1properly delegates to the internalse_attencomponent, maintaining clear separation of concerns.
904-910: Good error handling for stats retrieval.The
get_statsmethod properly checks if statistics have been computed and raises an appropriate error when needed.source/tests/consistent/descriptor/test_se_e2_a.py (1)
466-478: Redundant boilerplate in test setup – consider a helper to DRYEvery backend repeats an identical
compute_input_statsdictionary literal differing only in array/tensor constructors. Extracting a small helper (e.g._build_sample_dict(backend)) would slash ~40 lines and make future edits safer.[ suggest_optional_refactor ]
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
deepmd/dpmodel/descriptor/dpa1.py (1)
779-783:⚠️ Potential issueAdd
ndescrpt& initialisestatsto avoid runtimeAttributeError.
EnvMatStatSeexpects the descriptor to expose andescrptattribute, andget_stats()later assumesself.statsalready exists.
Both are currently missing, which will raise anAttributeErrorthe first time statistics are computed or read.@@ self.stddev = np.ones(wanted_shape, dtype=PRECISION_DICT[self.precision]) + # --- extra metadata needed by EnvMatStatSe ------------------------ + # total descriptor length (per–atom output size) + self.ndescrpt = self.filter_neuron[-1] * self.axis_neuron + # placeholder – will be filled by `compute_input_stats` + self.stats: dict[str, StatItem] | None = None self.orig_sel = self.selWithout this patch
compute_input_stats()will fail on
self.descriptor.ndescrptinsideEnvMatStatSe.__init__.
🧹 Nitpick comments (6)
deepmd/dpmodel/descriptor/repflows.py (1)
353-392: Well-implemented statistics computation methodThe
compute_input_statsmethod is well implemented with support for both direct data input and lazy loading via callable, along with optional statistics caching through the path parameter. This provides an efficient way to compute descriptor input statistics.Consider using a ternary operator for conciseness:
- if callable(merged): - # only get data for once - sampled = merged() - else: - sampled = merged + # only get data for once + sampled = merged() if callable(merged) else merged🧰 Tools
🪛 Ruff (0.8.2)
378-382: Use ternary operator
sampled = merged() if callable(merged) else mergedinstead ofif-else-block(SIM108)
deepmd/dpmodel/descriptor/repformers.py (1)
374-413: Consistent implementation of the statistics computation methodThe
compute_input_statsmethod is implemented in a way that's consistent with other descriptor files, with good support for lazy data loading and optional caching. This consistency across descriptor modules is valuable for maintainability.Consider using a ternary operator for conciseness:
- if callable(merged): - # only get data for once - sampled = merged() - else: - sampled = merged + # only get data for once + sampled = merged() if callable(merged) else merged🧰 Tools
🪛 Ruff (0.8.2)
399-403: Use ternary operator
sampled = merged() if callable(merged) else mergedinstead ofif-else-block(SIM108)
deepmd/dpmodel/descriptor/se_t_tebd.py (1)
650-689: Consistent implementation of statistics computationThe
compute_input_statsmethod inDescrptBlockSeTTebdfollows the same pattern as in other descriptor modules, providing a consistent interface across the codebase. The implementation correctly handles the mean and stddev updates.Consider using a ternary operator for conciseness:
- if callable(merged): - # only get data for once - sampled = merged() - else: - sampled = merged + # only get data for once + sampled = merged() if callable(merged) else merged🧰 Tools
🪛 Ruff (0.8.2)
675-679: Use ternary operator
sampled = merged() if callable(merged) else mergedinstead ofif-else-block(SIM108)
deepmd/dpmodel/descriptor/dpa1.py (1)
889-893: Minor: collapse trivialif-elsewith a ternary operator.Ruff flags this as
SIM108; it’s purely stylistic but does shorten the
function.- if callable(merged): - # only get data for once - sampled = merged() - else: - sampled = merged + sampled = merged() if callable(merged) else merged🧰 Tools
🪛 Ruff (0.8.2)
889-893: Use ternary operator
sampled = merged() if callable(merged) else mergedinstead ofif-else-block(SIM108)
deepmd/dpmodel/utils/env_mat_stat.py (1)
122-127: Remove unused local variablenatoms.
natomsis assigned but never referenced, triggeringruff(F841) and
wasting a tiny bit of memory.- coord, atype, box, natoms = ( + coord, atype, box, _ = (🧰 Tools
🪛 Ruff (0.8.2)
122-122: Local variable
natomsis assigned to but never usedRemove assignment to unused variable
natoms(F841)
source/tests/consistent/descriptor/test_se_e2_a.py (1)
466-541: Reduce test duplication with a small helper.The four
eval_*methods differ only by the backend specific
array/tensorconversion. Duplicating the 15-line
compute_input_statsblock for every backend hurts readability and
makes future changes error-prone.Consider extracting a tiny helper:
def _prep_sample(self, xp): return [{ "r0": None, "coord": xp.asarray(self.coords).reshape(-1, self.natoms[0], 3), "atype": xp.asarray(self.atype).reshape(1, -1), "box": xp.asarray(self.box).reshape(1, 3, 3), "natoms": self.natoms[0], }]Each
eval_*then becomes:sample = self._prep_sample(torch if INSTALLED_PT else np) pt_obj.compute_input_stats(sample) return self.eval_pt_descriptor(...)This keeps the intent clear and avoids four blocks of copy-pasted
code.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
deepmd/dpmodel/descriptor/dpa1.py(4 hunks)deepmd/dpmodel/descriptor/repflows.py(3 hunks)deepmd/dpmodel/descriptor/repformers.py(3 hunks)deepmd/dpmodel/descriptor/se_t_tebd.py(4 hunks)deepmd/dpmodel/utils/env_mat_stat.py(1 hunks)source/tests/consistent/descriptor/test_se_e2_a.py(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
deepmd/dpmodel/descriptor/repformers.py (2)
deepmd/dpmodel/utils/env_mat_stat.py (2)
EnvMatStatSe(65-253)get_hash(200-219)deepmd/utils/env_mat_stat.py (2)
StatItem(26-91)load_or_compute_stats(162-181)
🪛 Ruff (0.8.2)
deepmd/dpmodel/descriptor/dpa1.py
889-893: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block
(SIM108)
deepmd/dpmodel/descriptor/repflows.py
378-382: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block
(SIM108)
deepmd/dpmodel/descriptor/repformers.py
399-403: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block
(SIM108)
deepmd/dpmodel/descriptor/se_t_tebd.py
675-679: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block
(SIM108)
deepmd/dpmodel/utils/env_mat_stat.py
122-122: Local variable natoms is assigned to but never used
Remove assignment to unused variable natoms
(F841)
⏰ Context from checks skipped due to timeout of 90000ms (29)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Analyze (python)
- GitHub Check: Test C++ (false)
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Test C++ (true)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
🔇 Additional comments (4)
deepmd/dpmodel/descriptor/repflows.py (1)
393-399: Good implementation of get_stats with appropriate error handlingThe
get_statsmethod appropriately checks if statistics have been computed before returning them, raising a helpful runtime error otherwise.deepmd/dpmodel/descriptor/repformers.py (1)
414-420: Well-implemented get_stats method with appropriate error checkingThe
get_statsmethod correctly checks if statistics have been computed and provides a clear error message if they haven't.deepmd/dpmodel/descriptor/se_t_tebd.py (2)
254-275: Clean delegation of statistics computation to the internal descriptor blockThe
compute_input_statsmethod inDescrptSeTTebdis well-designed, correctly delegating to the internalse_ttebdblock. This simplifies the implementation while maintaining the full functionality.
690-696: Well-implemented get_stats method with proper error handlingThe
get_statsmethod correctly handles the case when statistics haven't been computed yet by raising a clear error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
deepmd/dpmodel/descriptor/repflows.py (1)
353-392: Well-implemented statistics computation method with proper documentation.The
compute_input_statsmethod provides an efficient implementation for computing descriptor input statistics with support for:
- Lazy data loading through callables
- File-based caching
- Proper handling of the
set_davg_zeroflagA minor style improvement could be made by using a ternary operator:
- if callable(merged): - # only get data for once - sampled = merged() - else: - sampled = merged + # only get data for once + sampled = merged() if callable(merged) else merged🧰 Tools
🪛 Ruff (0.8.2)
378-382: Use ternary operator
sampled = merged() if callable(merged) else mergedinstead ofif-else-block(SIM108)
deepmd/dpmodel/descriptor/repformers.py (1)
374-413: Well-implemented statistics computation method with proper documentation.The
compute_input_statsmethod provides an efficient implementation for computing descriptor input statistics with support for:
- Lazy data loading through callables
- File-based caching
- Proper handling of the
set_davg_zeroflagA minor style improvement could be made by using a ternary operator:
- if callable(merged): - # only get data for once - sampled = merged() - else: - sampled = merged + # only get data for once + sampled = merged() if callable(merged) else merged🧰 Tools
🪛 Ruff (0.8.2)
399-403: Use ternary operator
sampled = merged() if callable(merged) else mergedinstead ofif-else-block(SIM108)
deepmd/dpmodel/descriptor/dpa1.py (1)
889-893: Consider using a ternary operator for more concise code.The conditional assignment can be simplified using a ternary operator:
- if path is None or not path.is_dir(): - if callable(merged): - # only get data for once - sampled = merged() - else: - sampled = merged + if path is None or not path.is_dir(): + # only get data for once + sampled = merged() if callable(merged) else merged🧰 Tools
🪛 Ruff (0.8.2)
889-893: Use ternary operator
sampled = merged() if callable(merged) else mergedinstead ofif-else-block(SIM108)
deepmd/dpmodel/descriptor/se_t_tebd.py (1)
675-679: Consider using a ternary operator for more concise code.The conditional assignment can be simplified using a ternary operator:
- if path is None or not path.is_dir(): - if callable(merged): - # only get data for once - sampled = merged() - else: - sampled = merged + if path is None or not path.is_dir(): + # only get data for once + sampled = merged() if callable(merged) else merged🧰 Tools
🪛 Ruff (0.8.2)
675-679: Use ternary operator
sampled = merged() if callable(merged) else mergedinstead ofif-else-block(SIM108)
deepmd/dpmodel/utils/env_mat_stat.py (2)
122-127: Remove unusednatomsvariable.The variable
natomsis extracted from the system data but never used in the function. Consider removing it or document why it's extracted.- coord, atype, box, natoms = ( + coord, atype, box, _ = ( system["coord"], system["atype"], system["box"], system["natoms"], )🧰 Tools
🪛 Ruff (0.8.2)
122-122: Local variable
natomsis assigned to but never usedRemove assignment to unused variable
natoms(F841)
118-120: Fix typo in error message.There's a typo in the error message: "raial-only" should be "radial-only".
- "last_dim should be 1 for raial-only or 4 for full descriptor." + "last_dim should be 1 for radial-only or 4 for full descriptor."
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
deepmd/dpmodel/descriptor/dpa1.py(4 hunks)deepmd/dpmodel/descriptor/repflows.py(3 hunks)deepmd/dpmodel/descriptor/repformers.py(3 hunks)deepmd/dpmodel/descriptor/se_t_tebd.py(4 hunks)deepmd/dpmodel/utils/env_mat_stat.py(1 hunks)source/tests/consistent/descriptor/test_se_e2_a.py(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
deepmd/dpmodel/utils/env_mat_stat.py (11)
deepmd/dpmodel/common.py (1)
get_xp_precision(59-83)deepmd/dpmodel/utils/env_mat.py (1)
EnvMat(78-161)deepmd/dpmodel/utils/nlist.py (1)
extend_input_and_build_neighbor_list(20-49)deepmd/utils/env_mat_stat.py (1)
StatItem(26-91)deepmd/tf/descriptor/descriptor.py (1)
Descriptor(33-543)deepmd/dpmodel/descriptor/repflows.py (9)
get_ntypes(290-292)get_nsel(282-284)get_rcut(274-276)get_sel(286-288)mixed_types(322-332)get_rcut_smth(278-280)get_env_protection(334-336)call(408-518)call(923-1225)deepmd/dpmodel/descriptor/dpa1.py (19)
get_ntypes(358-360)get_ntypes(801-803)get_nsel(350-352)get_nsel(793-795)get_rcut(342-344)get_rcut(785-787)get_sel(354-356)get_sel(797-799)mixed_types(376-386)mixed_types(833-843)get_rcut_smth(346-348)get_rcut_smth(789-791)get_env_protection(396-398)get_env_protection(845-847)call(480-541)call(942-1086)call(1222-1232)call(1348-1359)call(1458-1517)deepmd/dpmodel/descriptor/se_r.py (7)
get_ntypes(262-264)get_rcut(212-214)get_sel(220-222)mixed_types(224-228)get_rcut_smth(216-218)get_env_protection(238-240)call(336-399)deepmd/dpmodel/descriptor/dpa3.py (7)
get_ntypes(357-359)get_nsel(349-351)get_rcut(341-343)get_sel(353-355)mixed_types(376-386)get_rcut_smth(345-347)get_env_protection(396-398)deepmd/dpmodel/descriptor/se_e2_a.py (6)
get_ntypes(303-305)get_rcut(253-255)get_sel(261-263)mixed_types(265-269)get_rcut_smth(257-259)get_env_protection(279-281)deepmd/dpmodel/descriptor/dpa2.py (7)
get_ntypes(623-625)get_nsel(615-617)get_rcut(607-609)get_sel(619-621)mixed_types(642-652)get_rcut_smth(611-613)get_env_protection(664-666)
deepmd/dpmodel/descriptor/dpa1.py (10)
deepmd/dpmodel/utils/env_mat_stat.py (2)
EnvMatStatSe(65-253)get_hash(200-219)deepmd/utils/env_mat_stat.py (2)
StatItem(26-91)load_or_compute_stats(162-181)deepmd/utils/path.py (1)
DPPath(27-157)deepmd/dpmodel/descriptor/se_t_tebd.py (3)
compute_input_stats(254-275)compute_input_stats(650-688)get_stats(690-696)deepmd/dpmodel/descriptor/se_r.py (1)
compute_input_stats(270-308)deepmd/dpmodel/descriptor/se_t.py (1)
compute_input_stats(250-288)deepmd/dpmodel/descriptor/repformers.py (2)
compute_input_stats(374-412)get_stats(414-420)deepmd/pt/model/descriptor/dpa1.py (1)
compute_input_stats(412-433)deepmd/pt/model/descriptor/se_atten.py (2)
compute_input_stats(361-402)get_stats(404-410)deepmd/dpmodel/descriptor/repflows.py (1)
get_stats(393-399)
deepmd/dpmodel/descriptor/se_t_tebd.py (4)
deepmd/dpmodel/utils/env_mat_stat.py (2)
EnvMatStatSe(65-253)get_hash(200-219)deepmd/utils/env_mat_stat.py (2)
StatItem(26-91)load_or_compute_stats(162-181)deepmd/utils/path.py (1)
DPPath(27-157)deepmd/dpmodel/descriptor/descriptor.py (1)
get_stats(106-108)
🪛 GitHub Check: CodeQL
source/tests/consistent/descriptor/test_se_e2_a.py
[notice] 347-347: Unused local variable
Variable type_one_side is not used.
[notice] 348-348: Unused local variable
Variable excluded_types is not used.
[notice] 349-349: Unused local variable
Variable precision is not used.
[notice] 350-350: Unused local variable
Variable env_protection is not used.
[notice] 357-357: Unused local variable
Variable resnet_dt is not used.
[notice] 358-358: Unused local variable
Variable type_one_side is not used.
[notice] 359-359: Unused local variable
Variable excluded_types is not used.
[notice] 360-360: Unused local variable
Variable precision is not used.
[notice] 361-361: Unused local variable
Variable env_protection is not used.
[notice] 383-383: Unused local variable
Variable resnet_dt is not used.
[notice] 384-384: Unused local variable
Variable type_one_side is not used.
[notice] 385-385: Unused local variable
Variable excluded_types is not used.
[notice] 386-386: Unused local variable
Variable precision is not used.
[notice] 387-387: Unused local variable
Variable env_protection is not used.
🪛 Ruff (0.8.2)
deepmd/dpmodel/utils/env_mat_stat.py
122-122: Local variable natoms is assigned to but never used
Remove assignment to unused variable natoms
(F841)
deepmd/dpmodel/descriptor/dpa1.py
889-893: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block
(SIM108)
deepmd/dpmodel/descriptor/repflows.py
378-382: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block
(SIM108)
deepmd/dpmodel/descriptor/repformers.py
399-403: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block
(SIM108)
deepmd/dpmodel/descriptor/se_t_tebd.py
675-679: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block
(SIM108)
⏰ Context from checks skipped due to timeout of 90000ms (29)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Test C++ (false)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Test C++ (true)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (c-cpp)
🔇 Additional comments (15)
deepmd/dpmodel/descriptor/repflows.py (2)
25-27: Good addition of necessary imports for statistics functionality.The added imports for
EnvMatStatSeandStatItemsupport the new statistics computation functionality being implemented.Also applies to: 38-40
393-399: Added proper statistics retrieval with error handling.The
get_statsmethod correctly returns the stored statistics and raises an appropriate error when statistics haven't been computed yet.deepmd/dpmodel/descriptor/repformers.py (2)
25-27: Good addition of necessary imports for statistics functionality.The added imports for
EnvMatStatSeandStatItemsupport the new statistics computation functionality being implemented.Also applies to: 39-41
414-420: Added proper statistics retrieval with error handling.The
get_statsmethod correctly returns the stored statistics and raises an appropriate error when statistics haven't been computed yet.deepmd/dpmodel/descriptor/dpa1.py (3)
865-902: LGTM! Well-implemented statistics computation for descriptors.The implementation of
compute_input_statscorrectly leverages the newEnvMatStatSeutility to compute or load cached statistics for the descriptor. The pattern matches other descriptor implementations, ensuring consistency across the codebase.🧰 Tools
🪛 Ruff (0.8.2)
889-893: Use ternary operator
sampled = merged() if callable(merged) else mergedinstead ofif-else-block(SIM108)
904-910: LGTM! Properly implemented stats retrieval.The
get_statsmethod correctly checks if statistics have been computed before returning them, with an appropriate error message if not available.
416-437: LGTM! Effective delegation to internal descriptor block.The implementation correctly delegates to the internal
se_attenblock'scompute_input_statsmethod, maintaining a clean separation of concerns.deepmd/dpmodel/descriptor/se_t_tebd.py (3)
651-688: LGTM! Proper implementation of statistics computation.The implementation correctly uses
EnvMatStatSeto compute or load cached statistics for the descriptor. The statistics are properly stored and the mean/stddev arrays are correctly updated with appropriate type consistency.🧰 Tools
🪛 Ruff (0.8.2)
675-679: Use ternary operator
sampled = merged() if callable(merged) else mergedinstead ofif-else-block(SIM108)
690-696: LGTM! Properly implemented stats retrieval.The
get_statsmethod correctly checks if statistics have been computed before returning them, with an appropriate error message if not available.
254-275: LGTM! Good delegation pattern.The implementation correctly delegates to the internal
se_ttebdblock'scompute_input_statsmethod, maintaining a clean separation of concerns.source/tests/consistent/descriptor/test_se_e2_a.py (1)
312-608: LGTM! Well-designed test for statistics computation.This test class effectively exercises the new
compute_input_statsfunctionality across multiple backends (DP, PyTorch, JAX, Paddle, Array API Strict). The test:
- Sets up descriptors with appropriate parameters
- Correctly formats input data for each backend
- Calls
compute_input_statswith the formatted data- Verifies that descriptors work correctly after statistics computation
The static analysis warnings about unused variables can be safely ignored, as they're part of a consistent parameter unpacking pattern used throughout the test classes.
🧰 Tools
🪛 GitHub Check: CodeQL
[notice] 347-347: Unused local variable
Variable type_one_side is not used.
[notice] 348-348: Unused local variable
Variable excluded_types is not used.
[notice] 349-349: Unused local variable
Variable precision is not used.
[notice] 350-350: Unused local variable
Variable env_protection is not used.
[notice] 357-357: Unused local variable
Variable resnet_dt is not used.
[notice] 358-358: Unused local variable
Variable type_one_side is not used.
[notice] 359-359: Unused local variable
Variable excluded_types is not used.
[notice] 360-360: Unused local variable
Variable precision is not used.
[notice] 361-361: Unused local variable
Variable env_protection is not used.
[notice] 383-383: Unused local variable
Variable resnet_dt is not used.
[notice] 384-384: Unused local variable
Variable type_one_side is not used.
[notice] 385-385: Unused local variable
Variable excluded_types is not used.
[notice] 386-386: Unused local variable
Variable precision is not used.
[notice] 387-387: Unused local variable
Variable env_protection is not used.deepmd/dpmodel/utils/env_mat_stat.py (4)
40-62: LGTM! Well-implemented statistics computation base class.The
EnvMatStatbase class effectively computes statistics (count, sum, squared sum) from environment matrices, with proper array API compatibility for multi-backend support.
65-199: LGTM! Thorough implementation of environment matrix statistics.The
EnvMatStatSeclass provides a complete implementation for processing descriptor data:
- Correctly identifies radial-only vs. full descriptors
- Properly handles coordinates, atom types, and neighbor lists
- Applies appropriate exclusion masks
- Efficiently processes statistics per atom type
- Supports array API compatibility for multiple backends
The implementation integrates well with the descriptor interface, using standard methods like
get_rcut(),get_sel(), etc.🧰 Tools
🪛 Ruff (0.8.2)
122-122: Local variable
natomsis assigned to but never usedRemove assignment to unused variable
natoms(F841)
200-219: LGTM! Robust hashing for statistics caching.The
get_hashmethod creates a deterministic hash based on key descriptor properties, ensuring consistent cache handling across runs.
221-253: LGTM! Well-formatted statistics output.The
__call__method correctly transforms raw statistics into the format expected by descriptors, handling both radial and angular components appropriately.
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## devel #4729 +/- ##
==========================================
- Coverage 84.80% 84.69% -0.12%
==========================================
Files 696 697 +1
Lines 67269 67472 +203
Branches 3541 3540 -1
==========================================
+ Hits 57050 57145 +95
- Misses 9087 9195 +108
Partials 1132 1132 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
wanghan-iapcm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we should refactorize all the comput_input_stat, as almost all the classes implement the method in exactly the same way.
Shall we consider firstly refactorize the pt backend and then implement the python backend?
Summary by CodeRabbit
New Features
Bug Fixes
Chores