feat(array-api): env mat stat #4729

njzjz · 2025-05-08T18:36:19Z

Summary by CodeRabbit

New Features
- Added comprehensive input statistics computation and retrieval methods for multiple descriptor classes, supporting mean and standard deviation calculations with lazy data loading and file caching.
- Introduced utilities for environment matrix statistics to enhance descriptor analysis and caching efficiency.
- Added new test coverage for descriptor input statistics across multiple backends, incorporating preprocessing of input statistics before evaluation.
Bug Fixes
- Improved array namespace compatibility in utility functions for consistent array operations.
- Updated test logic to exclude specific keys from data comparisons, improving test reliability.
Chores
- Removed unused imports for cleaner code maintenance.

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn>

deepmd/dpmodel/utils/env_mat_stat.py

source/tests/consistent/descriptor/test_se_e2_a.py

+        return CommonTest.skip_pt
+
+    @property
+    def skip_dp(self) -> bool:


source/tests/consistent/descriptor/test_se_e2_a.py

+
+    @property
+    def skip_dp(self) -> bool:
+        (


source/tests/consistent/descriptor/test_se_e2_a.py

+    @property
+    def skip_dp(self) -> bool:
+        (
+            resnet_dt,


source/tests/consistent/descriptor/test_se_e2_a.py

+    def skip_dp(self) -> bool:
+        (
+            resnet_dt,
+            type_one_side,


source/tests/consistent/descriptor/test_se_e2_a.py

+        ) = self.param
+        return not INSTALLED_PD
+
+    @property


source/tests/consistent/descriptor/test_se_e2_a.py

+        return not INSTALLED_PD
+
+    @property
+    def skip_array_api_strict(self) -> bool:


source/tests/consistent/descriptor/test_se_e2_a.py

+
+    @property
+    def skip_array_api_strict(self) -> bool:
+        (


source/tests/consistent/descriptor/test_se_e2_a.py

+    @property
+    def skip_array_api_strict(self) -> bool:
+        (
+            resnet_dt,


source/tests/consistent/descriptor/test_se_e2_a.py

+    def skip_array_api_strict(self) -> bool:
+        (
+            resnet_dt,
+            type_one_side,


coderabbitai · 2025-05-08T18:39:27Z

📝 Walkthrough

Walkthrough

Previously unimplemented methods for computing and retrieving descriptor input statistics (compute_input_stats, get_stats) were implemented across several descriptor classes and blocks. A new utility module for environment matrix statistics was added. Related test coverage was expanded, and array API compatibility was improved in utility functions. Minor test logic was updated for serialization comparison.

Changes

File(s)	Change Summary
`deepmd/dpmodel/descriptor/dpa1.py`, `deepmd/dpmodel/descriptor/dpa2.py`, `deepmd/dpmodel/descriptor/dpa3.py`, `deepmd/dpmodel/descriptor/repflows.py`, `deepmd/dpmodel/descriptor/repformers.py`, `deepmd/dpmodel/descriptor/se_e2_a.py`, `deepmd/dpmodel/descriptor/se_r.py`, `deepmd/dpmodel/descriptor/se_t.py`, `deepmd/dpmodel/descriptor/se_t_tebd.py`	Implemented `compute_input_stats` and `get_stats` methods in descriptor and block classes, replacing stubs, and updating signatures to support lazy data sampling and file-based caching. Some classes also added `ndescrpt` attributes.
`deepmd/dpmodel/utils/env_mat_stat.py`	New module providing `EnvMatStat` and `EnvMatStatSe` classes for computing environment matrix statistics for descriptors.
`deepmd/dpmodel/utils/nlist.py`, `deepmd/dpmodel/utils/region.py`	Updated array operations to use array API compatibility (`xp`) for reshaping and type consistency.
`source/tests/consistent/common.py`	Adjusted test comparison logic to exclude the `@variables` key from equality checks.
`source/tests/consistent/descriptor/test_se_e2_a.py`	Added `TestSeAStat` class to test statistical input computation for descriptors across multiple backends and precisions.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Descriptor
    participant EnvMatStatSe

    User->>Descriptor: compute_input_stats(merged, path)
    alt path is directory with cached stats
        Descriptor->>EnvMatStatSe: load stats from path
    else
        alt merged is callable
            Descriptor->>User: merged()
            User-->>Descriptor: list[dict]
        end
        Descriptor->>EnvMatStatSe: compute stats from data
    end
    EnvMatStatSe-->>Descriptor: stats (mean, std)
    Descriptor->>Descriptor: set davg, dstd, stats
    User->>Descriptor: get_stats()
    Descriptor-->>User: stats dict

Possibly related PRs

feat(dp/pt): refactor se_e3 descriptor #3813: Implements and refactors compute_input_stats and get_stats in DescrptSeT and related classes, directly overlapping with the methods and files changed here.

Suggested labels

Docs

Suggested reviewers

wanghan-iapcm

Tip

⚡️ Faster reviews with caching

CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.

Enjoy the performance boost—your workflow just got faster.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 959324a and abe5a39.

📒 Files selected for processing (1)

source/tests/consistent/common.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

source/tests/consistent/common.py

⏰ Context from checks skipped due to timeout of 90000ms (29)

GitHub Check: Test Python (6, 3.12)
GitHub Check: Test Python (6, 3.9)
GitHub Check: Test Python (5, 3.12)
GitHub Check: Test Python (5, 3.9)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Test Python (4, 3.9)
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Test Python (3, 3.12)
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Test Python (3, 3.9)
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Build C++ (clang, clang)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Test Python (2, 3.12)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Test Python (2, 3.9)
GitHub Check: Analyze (python)
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Test C++ (false)
GitHub Check: Test Python (1, 3.12)
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Test C++ (true)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Analyze (c-cpp)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (10)

deepmd/dpmodel/descriptor/dpa3.py (1)
446-450: Implemented compute_input_stats method for descriptor statistics.

The implementation now properly delegates to the internal repflows descriptor component to compute input statistics, allowing normalization of descriptors. This matches the pattern used in other descriptor classes like DescrptDPA1 and DescrptDPA2.

Consider renaming the unused loop variable ii to _ or _ii to indicate it's not being used within the loop body:
-        for ii, descrpt in enumerate(descrpt_list):
+        for _, descrpt in enumerate(descrpt_list):
🧰 Tools

🪛 Ruff (0.8.2)

449-449: Loop control variable ii not used within loop body

Rename unused ii to _ii

(B007)
deepmd/dpmodel/descriptor/dpa2.py (1)
740-765: Implemented compute_input_stats method to calculate input statistics.

The implementation correctly handles both direct data and lazy loading via callable, with proper path handling for caching. The method efficiently delegates statistics computation to each of the descriptor components.

However, there's one minor improvement to consider:
- for ii, descrpt in enumerate(descrpt_list):
+ for _ii, descrpt in enumerate(descrpt_list):
or
- for ii, descrpt in enumerate(descrpt_list):
+ for _, descrpt in enumerate(descrpt_list):
Since the index variable is not used in the loop body, using an underscore prefix or just underscore is more idiomatic.

🧰 Tools

🪛 Ruff (0.8.2)

764-764: Loop control variable ii not used within loop body

Rename unused ii to _ii

(B007)
deepmd/dpmodel/descriptor/repflows.py (1)
353-392: Implemented compute_input_stats method for descriptor statistics.

The implementation properly computes or loads statistical data for the descriptor inputs. The code correctly handles the path for caching, lazy loading of data, and properly updates the descriptor's mean and standard deviation values.

A minor style improvement could be made:
- if callable(merged):
-     # only get data for once
-     sampled = merged()
- else:
-     sampled = merged
+ sampled = merged() if callable(merged) else merged
This makes the code more concise while maintaining readability.

🧰 Tools

🪛 Ruff (0.8.2)

378-382: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)
deepmd/dpmodel/descriptor/se_r.py (1)
270-308: Implemented compute_input_stats method for radial descriptor statistics.

The implementation properly handles statistics computation for the SE_R descriptor, with correct path handling for caching, support for lazy data loading, and appropriate updating of mean and standard deviation values.

A minor style improvement could be made:
- if callable(merged):
-     # only get data for once
-     sampled = merged()
- else:
-     sampled = merged
+ sampled = merged() if callable(merged) else merged
This makes the code more concise while maintaining readability.

🧰 Tools

🪛 Ruff (0.8.2)

295-299: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)
deepmd/dpmodel/descriptor/se_t.py (1)
250-288: Implemented compute_input_stats method for angular descriptor statistics.

The implementation properly handles statistics computation for the SE_T descriptor, with correct path handling for caching, support for lazy data loading, and appropriate updating of mean and standard deviation values. The implementation is consistent with other descriptor classes, maintaining good code uniformity.

A minor style improvement could be made:
- if callable(merged):
-     # only get data for once
-     sampled = merged()
- else:
-     sampled = merged
+ sampled = merged() if callable(merged) else merged
This makes the code more concise while maintaining readability.

🧰 Tools

🪛 Ruff (0.8.2)

275-279: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)
deepmd/dpmodel/descriptor/repformers.py (1)
374-413: Well-implemented statistics computation with proper caching support.

The compute_input_stats method follows a good pattern, handling both lazy data sampling and statistics caching. It correctly manages path-based caching and updating descriptor statistics with appropriate type conversion.

Consider using a ternary operator for better readability:
-            if callable(merged):
-                # only get data for once
-                sampled = merged()
-            else:
-                sampled = merged
+            sampled = merged() if callable(merged) else merged
🧰 Tools

🪛 Ruff (0.8.2)

399-403: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)
deepmd/dpmodel/descriptor/se_e2_a.py (1)
311-349: Well-implemented statistics computation.

The compute_input_stats method is well-structured, supporting both lazy data loading and file-based caching via the EnvMatStatSe utility.

Consider using a ternary operator for better readability:
-            if callable(merged):
-                # only get data for once
-                sampled = merged()
-            else:
-                sampled = merged
+            sampled = merged() if callable(merged) else merged
🧰 Tools

🪛 Ruff (0.8.2)

336-340: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)
deepmd/dpmodel/descriptor/dpa1.py (1)
864-903: Well-implemented statistics computation in DescrptBlockSeAtten.

The statistics computation is implemented with proper support for lazy data loading and caching, consistent with other descriptor implementations.

Consider using a ternary operator for better readability:
-            if callable(merged):
-                # only get data for once
-                sampled = merged()
-            else:
-                sampled = merged
+            sampled = merged() if callable(merged) else merged
🧰 Tools

🪛 Ruff (0.8.2)

889-893: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)
deepmd/dpmodel/utils/env_mat_stat.py (1)
122-129: Remove unused local natoms to avoid dead‐code noise

natoms is extracted from system but never referenced afterwards. Keeping unused bindings clutters the scope and may mislead future readers.
-            coord, atype, box, natoms = (
-                system["coord"],
-                system["atype"],
-                system["box"],
-                system["natoms"],
-            )
+            coord, atype, box = (
+                system["coord"],
+                system["atype"],
+                system["box"],
+            )
🧰 Tools

🪛 Ruff (0.8.2)

122-122: Local variable natoms is assigned to but never used

Remove assignment to unused variable natoms

(F841)
deepmd/dpmodel/descriptor/se_t_tebd.py (1)
675-679: Tiny style-only simplification opportunity

The if callable(merged) block can be collapsed using a ternary expression, as the Ruff hint suggests, improving brevity without hurting clarity:
-            if callable(merged):
-                # only get data for once
-                sampled = merged()
-            else:
-                sampled = merged
+            sampled = merged() if callable(merged) else merged
🧰 Tools

🪛 Ruff (0.8.2)

675-679: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 17da9d2 and f02db7f.

📒 Files selected for processing (14)

deepmd/dpmodel/descriptor/dpa1.py (4 hunks)
deepmd/dpmodel/descriptor/dpa2.py (2 hunks)
deepmd/dpmodel/descriptor/dpa3.py (1 hunks)
deepmd/dpmodel/descriptor/repflows.py (3 hunks)
deepmd/dpmodel/descriptor/repformers.py (3 hunks)
deepmd/dpmodel/descriptor/se_e2_a.py (4 hunks)
deepmd/dpmodel/descriptor/se_r.py (4 hunks)
deepmd/dpmodel/descriptor/se_t.py (4 hunks)
deepmd/dpmodel/descriptor/se_t_tebd.py (4 hunks)
deepmd/dpmodel/utils/env_mat_stat.py (1 hunks)
deepmd/dpmodel/utils/nlist.py (2 hunks)
deepmd/dpmodel/utils/region.py (1 hunks)
source/tests/consistent/common.py (2 hunks)
source/tests/consistent/descriptor/test_se_e2_a.py (3 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (6)

deepmd/dpmodel/utils/nlist.py (1)

deepmd/dpmodel/utils/region.py (1)

normalize_coord (53-75)

deepmd/dpmodel/descriptor/dpa3.py (4)

deepmd/dpmodel/descriptor/dpa2.py (1)

compute_input_stats (740-765)

deepmd/dpmodel/descriptor/repflows.py (1)

compute_input_stats (353-391)

deepmd/pt/model/descriptor/descriptor.py (1)

compute_input_stats (102-123)

deepmd/utils/path.py (1)

DPPath (27-157)

deepmd/dpmodel/descriptor/dpa2.py (3)

deepmd/utils/path.py (1)

DPPath (27-157)

deepmd/pt/model/descriptor/descriptor.py (1)

compute_input_stats (102-123)

deepmd/pt/model/descriptor/dpa2.py (1)

compute_input_stats (482-507)

deepmd/dpmodel/descriptor/dpa1.py (8)

deepmd/dpmodel/utils/env_mat_stat.py (2)

EnvMatStatSe (65-253)

get_hash (200-219)

deepmd/utils/env_mat_stat.py (2)

StatItem (26-91)

load_or_compute_stats (162-181)

deepmd/dpmodel/descriptor/se_r.py (1)

compute_input_stats (270-308)

deepmd/dpmodel/descriptor/se_t.py (1)

compute_input_stats (250-288)

deepmd/dpmodel/descriptor/repformers.py (2)

compute_input_stats (374-412)

get_stats (414-420)

deepmd/pt/model/descriptor/dpa1.py (1)

compute_input_stats (412-433)

deepmd/pt/model/descriptor/se_atten.py (2)

compute_input_stats (361-402)

get_stats (404-410)

deepmd/dpmodel/descriptor/repflows.py (1)

get_stats (393-399)

deepmd/dpmodel/descriptor/se_r.py (2)

deepmd/dpmodel/utils/env_mat_stat.py (2)

EnvMatStatSe (65-253)

get_hash (200-219)

deepmd/utils/env_mat_stat.py (1)

load_or_compute_stats (162-181)

source/tests/consistent/descriptor/test_se_e2_a.py (2)

source/tests/consistent/descriptor/common.py (7)

DescriptorTest (57-198)

build_tf_descriptor (60-83)

eval_dp_descriptor (85-102)

eval_pt_descriptor (104-124)

eval_jax_descriptor (126-146)

eval_pd_descriptor (148-168)

eval_array_api_strict_descriptor (170-198)

deepmd/utils/argcheck.py (1)

descrpt_se_a_args (271-335)

🪛 Ruff (0.8.2)

deepmd/dpmodel/descriptor/repflows.py

378-382: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)

deepmd/dpmodel/descriptor/dpa3.py

449-449: Loop control variable ii not used within loop body

Rename unused ii to _ii

(B007)

deepmd/dpmodel/descriptor/repformers.py

399-403: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)

deepmd/dpmodel/descriptor/se_t.py

275-279: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)

deepmd/dpmodel/descriptor/dpa2.py

764-764: Loop control variable ii not used within loop body

Rename unused ii to _ii

(B007)

deepmd/dpmodel/descriptor/se_e2_a.py

336-340: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)

deepmd/dpmodel/descriptor/dpa1.py

889-893: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)

deepmd/dpmodel/descriptor/se_r.py

295-299: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)

deepmd/dpmodel/utils/env_mat_stat.py

122-122: Local variable natoms is assigned to but never used

Remove assignment to unused variable natoms

(F841)

deepmd/dpmodel/descriptor/se_t_tebd.py

675-679: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)

⏰ Context from checks skipped due to timeout of 90000ms (29)

GitHub Check: Test Python (6, 3.12)
GitHub Check: Test Python (6, 3.9)
GitHub Check: Test Python (5, 3.12)
GitHub Check: Test Python (5, 3.9)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Test Python (4, 3.9)
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Test Python (3, 3.12)
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Test Python (3, 3.9)
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Build C++ (clang, clang)
GitHub Check: Test Python (2, 3.12)
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Analyze (python)
GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Test Python (1, 3.12)
GitHub Check: Test C++ (false)
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Test C++ (true)
GitHub Check: Analyze (c-cpp)

🔇 Additional comments (21)

deepmd/dpmodel/utils/region.py (1)

74-74: Improved array API compatibility for remainder operation.

The change from 1.0 to xp.asarray(1.0) ensures proper type consistency when using array API compatible libraries. This is important because when operations are performed between arrays and scalar literals, some array API implementations might throw errors if the types aren't from the same namespace.

deepmd/dpmodel/utils/nlist.py (3)

28-28: Good addition for array API compatibility.

Adding the array namespace extraction at the beginning of the function ensures consistent array operations throughout the function body, allowing the code to work with different array backends.

32-34: Properly using array namespace for reshaping operations.

Converting the NumPy-specific reshaping to use the array API compatible xp.reshape ensures consistency with the chosen array backend. This change is crucial for supporting multiple backends like JAX, PyTorch, or NumPy.

48-48: Consistent use of array namespace for reshaping.

Using xp.reshape instead of direct reshaping maintains backend consistency throughout the function. This is a necessary change to ensure the function works correctly across different array libraries.

source/tests/consistent/common.py (2)

438-440: Smart test enhancement to ignore variable statistics.

Removing @variables from the serialized data comparison is a good approach as these statistics (like mean and standard deviation) might legitimately differ between backends while still producing functionally equivalent models. This change prevents false test failures when only the statistics differ.

475-477: Consistent handling of variables in JAX backend tests.

Similar to the PyTorch test modifications, excluding @variables from the serialized data comparison in JAX tests ensures consistent test behavior across different backend implementations.

deepmd/dpmodel/descriptor/dpa2.py (1)

3-3: Added Callable to imports for compute_input_stats implementation.

Good addition of the required typing import for the newly implemented compute_input_stats method.

deepmd/dpmodel/descriptor/repflows.py (3)

25-31: Added EnvMatStatSe import for statistics computation.

Good addition of the required import for environment matrix statistics calculation.

38-40: Added StatItem import for statistics data typing.

Appropriate import for handling typed statistical data.

393-399: Added get_stats method for retrieving computed statistics.

Good implementation of the get_stats method with proper error handling when statistics haven't been computed yet.

deepmd/dpmodel/descriptor/se_r.py (3)

3-4: Added Callable to imports for compute_input_stats implementation.

Good addition of the required typing import for the new method.

29-31: Added EnvMatStatSe import for statistics computation.

Appropriate import for the environment matrix statistics functionality.

181-181: Added ndescrpt attribute to track descriptor dimension.

Good addition of the ndescrpt attribute which properly matches the descriptor dimension for se_r.

deepmd/dpmodel/descriptor/se_t.py (3)

4-4: Added Callable to imports for compute_input_stats implementation.

Good addition of the required typing import for the new method.

29-31: Added EnvMatStatSe import for statistics computation.

Appropriate import for the environment matrix statistics functionality.

161-161: Added ndescrpt attribute with correct dimension calculation.

Good addition of the ndescrpt attribute which properly sets the descriptor dimension to nnei * 4 for the se_t descriptor, accounting for the 4 components per neighbor.

deepmd/dpmodel/descriptor/repformers.py (1)

414-420: Good error handling for stats retrieval.

The get_stats method appropriately checks if statistics have been computed and raises a clear error message if not.

deepmd/dpmodel/descriptor/se_e2_a.py (1)

222-222: Good addition of descriptor dimension.

The ndescrpt property is properly initialized as the product of neighbors and dimensions per neighbor, which is needed for statistics calculation.

deepmd/dpmodel/descriptor/dpa1.py (2)

416-437: Good delegation to underlying components.

The compute_input_stats method in DescrptDPA1 properly delegates to the internal se_atten component, maintaining clear separation of concerns.

904-910: Good error handling for stats retrieval.

The get_stats method properly checks if statistics have been computed and raises an appropriate error when needed.

source/tests/consistent/descriptor/test_se_e2_a.py (1)

466-478: Redundant boilerplate in test setup – consider a helper to DRY

Every backend repeats an identical compute_input_stats dictionary literal differing only in array/tensor constructors. Extracting a small helper (e.g. _build_sample_dict(backend)) would slash ~40 lines and make future edits safer.

[ suggest_optional_refactor ]

deepmd/dpmodel/utils/env_mat_stat.py

deepmd/dpmodel/descriptor/se_t_tebd.py

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn>

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (1)

deepmd/dpmodel/descriptor/dpa1.py (1)
779-783: ⚠️ Potential issue

Add ndescrpt & initialise stats to avoid runtime AttributeError.

EnvMatStatSe expects the descriptor to expose a ndescrpt attribute, and get_stats() later assumes self.stats already exists.
Both are currently missing, which will raise an AttributeError the first time statistics are computed or read.
@@
 self.stddev = np.ones(wanted_shape, dtype=PRECISION_DICT[self.precision])
+        # --- extra metadata needed by EnvMatStatSe ------------------------
+        # total descriptor length (per–atom output size)
+        self.ndescrpt = self.filter_neuron[-1] * self.axis_neuron
+        # placeholder – will be filled by `compute_input_stats`
+        self.stats: dict[str, StatItem] | None = None
         self.orig_sel = self.sel
Without this patch compute_input_stats() will fail on
self.descriptor.ndescrpt inside EnvMatStatSe.__init__.

🧹 Nitpick comments (6)

deepmd/dpmodel/descriptor/repflows.py (1)
353-392: Well-implemented statistics computation method

The compute_input_stats method is well implemented with support for both direct data input and lazy loading via callable, along with optional statistics caching through the path parameter. This provides an efficient way to compute descriptor input statistics.

Consider using a ternary operator for conciseness:
-            if callable(merged):
-                # only get data for once
-                sampled = merged()
-            else:
-                sampled = merged
+            # only get data for once
+            sampled = merged() if callable(merged) else merged
🧰 Tools

🪛 Ruff (0.8.2)

378-382: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)
deepmd/dpmodel/descriptor/repformers.py (1)
374-413: Consistent implementation of the statistics computation method

The compute_input_stats method is implemented in a way that's consistent with other descriptor files, with good support for lazy data loading and optional caching. This consistency across descriptor modules is valuable for maintainability.

Consider using a ternary operator for conciseness:
-            if callable(merged):
-                # only get data for once
-                sampled = merged()
-            else:
-                sampled = merged
+            # only get data for once
+            sampled = merged() if callable(merged) else merged
🧰 Tools

🪛 Ruff (0.8.2)

399-403: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)
deepmd/dpmodel/descriptor/se_t_tebd.py (1)
650-689: Consistent implementation of statistics computation

The compute_input_stats method in DescrptBlockSeTTebd follows the same pattern as in other descriptor modules, providing a consistent interface across the codebase. The implementation correctly handles the mean and stddev updates.

Consider using a ternary operator for conciseness:
-            if callable(merged):
-                # only get data for once
-                sampled = merged()
-            else:
-                sampled = merged
+            # only get data for once
+            sampled = merged() if callable(merged) else merged
🧰 Tools

🪛 Ruff (0.8.2)

675-679: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)
deepmd/dpmodel/descriptor/dpa1.py (1)
889-893: Minor: collapse trivial if-else with a ternary operator.

Ruff flags this as SIM108; it’s purely stylistic but does shorten the
function.
-            if callable(merged):
-                # only get data for once
-                sampled = merged()
-            else:
-                sampled = merged
+            sampled = merged() if callable(merged) else merged
🧰 Tools

🪛 Ruff (0.8.2)

889-893: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)
deepmd/dpmodel/utils/env_mat_stat.py (1)
122-127: Remove unused local variable natoms.

natoms is assigned but never referenced, triggering ruff (F841) and
wasting a tiny bit of memory.
-            coord, atype, box, natoms = (
+            coord, atype, box, _ = (
🧰 Tools

🪛 Ruff (0.8.2)

122-122: Local variable natoms is assigned to but never used

Remove assignment to unused variable natoms

(F841)
source/tests/consistent/descriptor/test_se_e2_a.py (1)
466-541: Reduce test duplication with a small helper.

The four eval_* methods differ only by the backend specific
array/tensor conversion. Duplicating the 15-line
compute_input_stats block for every backend hurts readability and
makes future changes error-prone.

Consider extracting a tiny helper:
def _prep_sample(self, xp):
    return [{
        "r0": None,
        "coord": xp.asarray(self.coords).reshape(-1, self.natoms[0], 3),
        "atype": xp.asarray(self.atype).reshape(1, -1),
        "box":   xp.asarray(self.box).reshape(1, 3, 3),
        "natoms": self.natoms[0],
    }]
Each eval_* then becomes:
sample = self._prep_sample(torch if INSTALLED_PT else np)
pt_obj.compute_input_stats(sample)
return self.eval_pt_descriptor(...)
This keeps the intent clear and avoids four blocks of copy-pasted
code.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f02db7f and f9d18d6.

📒 Files selected for processing (6)

deepmd/dpmodel/descriptor/dpa1.py (4 hunks)
deepmd/dpmodel/descriptor/repflows.py (3 hunks)
deepmd/dpmodel/descriptor/repformers.py (3 hunks)
deepmd/dpmodel/descriptor/se_t_tebd.py (4 hunks)
deepmd/dpmodel/utils/env_mat_stat.py (1 hunks)
source/tests/consistent/descriptor/test_se_e2_a.py (3 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

deepmd/dpmodel/descriptor/repformers.py (2)

deepmd/dpmodel/utils/env_mat_stat.py (2)

EnvMatStatSe (65-253)

get_hash (200-219)

deepmd/utils/env_mat_stat.py (2)

StatItem (26-91)

load_or_compute_stats (162-181)

🪛 Ruff (0.8.2)

deepmd/dpmodel/descriptor/dpa1.py

889-893: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)

deepmd/dpmodel/descriptor/repflows.py

378-382: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)

deepmd/dpmodel/descriptor/repformers.py

399-403: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)

deepmd/dpmodel/descriptor/se_t_tebd.py

675-679: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)

deepmd/dpmodel/utils/env_mat_stat.py

122-122: Local variable natoms is assigned to but never used

Remove assignment to unused variable natoms

(F841)

⏰ Context from checks skipped due to timeout of 90000ms (29)

GitHub Check: Test Python (6, 3.12)
GitHub Check: Test Python (6, 3.9)
GitHub Check: Test Python (5, 3.12)
GitHub Check: Test Python (5, 3.9)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Test Python (4, 3.9)
GitHub Check: Test Python (3, 3.12)
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build C++ (clang, clang)
GitHub Check: Test Python (3, 3.9)
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Test Python (2, 3.12)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Test Python (1, 3.12)
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Analyze (python)
GitHub Check: Test C++ (false)
GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Test C++ (true)
GitHub Check: Analyze (c-cpp)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)

🔇 Additional comments (4)

deepmd/dpmodel/descriptor/repflows.py (1)

393-399: Good implementation of get_stats with appropriate error handling

The get_stats method appropriately checks if statistics have been computed before returning them, raising a helpful runtime error otherwise.

deepmd/dpmodel/descriptor/repformers.py (1)

414-420: Well-implemented get_stats method with appropriate error checking

The get_stats method correctly checks if statistics have been computed and provides a clear error message if they haven't.

deepmd/dpmodel/descriptor/se_t_tebd.py (2)

254-275: Clean delegation of statistics computation to the internal descriptor block

The compute_input_stats method in DescrptSeTTebd is well-designed, correctly delegating to the internal se_ttebd block. This simplifies the implementation while maintaining the full functionality.

690-696: Well-implemented get_stats method with proper error handling

The get_stats method correctly handles the case when statistics haven't been computed yet by raising a clear error message.

deepmd/dpmodel/descriptor/dpa1.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (6)

deepmd/dpmodel/descriptor/repflows.py (1)
353-392: Well-implemented statistics computation method with proper documentation.

The compute_input_stats method provides an efficient implementation for computing descriptor input statistics with support for:

Lazy data loading through callables

File-based caching

Proper handling of the set_davg_zero flag

A minor style improvement could be made by using a ternary operator:
- if callable(merged):
-     # only get data for once
-     sampled = merged()
- else:
-     sampled = merged
+ # only get data for once
+ sampled = merged() if callable(merged) else merged
🧰 Tools

🪛 Ruff (0.8.2)

378-382: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)
deepmd/dpmodel/descriptor/repformers.py (1)
374-413: Well-implemented statistics computation method with proper documentation.

The compute_input_stats method provides an efficient implementation for computing descriptor input statistics with support for:

Lazy data loading through callables

File-based caching

Proper handling of the set_davg_zero flag

A minor style improvement could be made by using a ternary operator:
- if callable(merged):
-     # only get data for once
-     sampled = merged()
- else:
-     sampled = merged
+ # only get data for once
+ sampled = merged() if callable(merged) else merged
🧰 Tools

🪛 Ruff (0.8.2)

399-403: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)
deepmd/dpmodel/descriptor/dpa1.py (1)
889-893: Consider using a ternary operator for more concise code.

The conditional assignment can be simplified using a ternary operator:
-        if path is None or not path.is_dir():
-            if callable(merged):
-                # only get data for once
-                sampled = merged()
-            else:
-                sampled = merged
+        if path is None or not path.is_dir():
+            # only get data for once
+            sampled = merged() if callable(merged) else merged
🧰 Tools

🪛 Ruff (0.8.2)

889-893: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)
deepmd/dpmodel/descriptor/se_t_tebd.py (1)
675-679: Consider using a ternary operator for more concise code.

The conditional assignment can be simplified using a ternary operator:
-        if path is None or not path.is_dir():
-            if callable(merged):
-                # only get data for once
-                sampled = merged()
-            else:
-                sampled = merged
+        if path is None or not path.is_dir():
+            # only get data for once
+            sampled = merged() if callable(merged) else merged
🧰 Tools

🪛 Ruff (0.8.2)

675-679: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)
deepmd/dpmodel/utils/env_mat_stat.py (2)
122-127: Remove unused natoms variable.

The variable natoms is extracted from the system data but never used in the function. Consider removing it or document why it's extracted.
-            coord, atype, box, natoms = (
+            coord, atype, box, _ = (
                 system["coord"],
                 system["atype"],
                 system["box"],
                 system["natoms"],
             )
🧰 Tools

🪛 Ruff (0.8.2)

122-122: Local variable natoms is assigned to but never used

Remove assignment to unused variable natoms

(F841)

118-120: Fix typo in error message.

There's a typo in the error message: "raial-only" should be "radial-only".
-                "last_dim should be 1 for raial-only or 4 for full descriptor."
+                "last_dim should be 1 for radial-only or 4 for full descriptor."

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f02db7f and 959324a.

📒 Files selected for processing (6)

deepmd/dpmodel/descriptor/dpa1.py (4 hunks)
deepmd/dpmodel/descriptor/repflows.py (3 hunks)
deepmd/dpmodel/descriptor/repformers.py (3 hunks)
deepmd/dpmodel/descriptor/se_t_tebd.py (4 hunks)
deepmd/dpmodel/utils/env_mat_stat.py (1 hunks)
source/tests/consistent/descriptor/test_se_e2_a.py (3 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (3)

deepmd/dpmodel/utils/env_mat_stat.py (11)

deepmd/dpmodel/common.py (1)

get_xp_precision (59-83)

deepmd/dpmodel/utils/env_mat.py (1)

EnvMat (78-161)

deepmd/dpmodel/utils/nlist.py (1)

extend_input_and_build_neighbor_list (20-49)

deepmd/utils/env_mat_stat.py (1)

StatItem (26-91)

deepmd/tf/descriptor/descriptor.py (1)

Descriptor (33-543)

deepmd/dpmodel/descriptor/repflows.py (9)

get_ntypes (290-292)

get_nsel (282-284)

get_rcut (274-276)

get_sel (286-288)

mixed_types (322-332)

get_rcut_smth (278-280)

get_env_protection (334-336)

call (408-518)

call (923-1225)

deepmd/dpmodel/descriptor/dpa1.py (19)

get_ntypes (358-360)

get_ntypes (801-803)

get_nsel (350-352)

get_nsel (793-795)

get_rcut (342-344)

get_rcut (785-787)

get_sel (354-356)

get_sel (797-799)

mixed_types (376-386)

mixed_types (833-843)

get_rcut_smth (346-348)

get_rcut_smth (789-791)

get_env_protection (396-398)

get_env_protection (845-847)

call (480-541)

call (942-1086)

call (1222-1232)

call (1348-1359)

call (1458-1517)

deepmd/dpmodel/descriptor/se_r.py (7)

get_ntypes (262-264)

get_rcut (212-214)

get_sel (220-222)

mixed_types (224-228)

get_rcut_smth (216-218)

get_env_protection (238-240)

call (336-399)

deepmd/dpmodel/descriptor/dpa3.py (7)

get_ntypes (357-359)

get_nsel (349-351)

get_rcut (341-343)

get_sel (353-355)

mixed_types (376-386)

get_rcut_smth (345-347)

get_env_protection (396-398)

deepmd/dpmodel/descriptor/se_e2_a.py (6)

get_ntypes (303-305)

get_rcut (253-255)

get_sel (261-263)

mixed_types (265-269)

get_rcut_smth (257-259)

get_env_protection (279-281)

deepmd/dpmodel/descriptor/dpa2.py (7)

get_ntypes (623-625)

get_nsel (615-617)

get_rcut (607-609)

get_sel (619-621)

mixed_types (642-652)

get_rcut_smth (611-613)

get_env_protection (664-666)

deepmd/dpmodel/descriptor/dpa1.py (10)

deepmd/dpmodel/utils/env_mat_stat.py (2)

EnvMatStatSe (65-253)

get_hash (200-219)

deepmd/utils/env_mat_stat.py (2)

StatItem (26-91)

load_or_compute_stats (162-181)

deepmd/utils/path.py (1)

DPPath (27-157)

deepmd/dpmodel/descriptor/se_t_tebd.py (3)

compute_input_stats (254-275)

compute_input_stats (650-688)

get_stats (690-696)

deepmd/dpmodel/descriptor/se_r.py (1)

compute_input_stats (270-308)

deepmd/dpmodel/descriptor/se_t.py (1)

compute_input_stats (250-288)

deepmd/dpmodel/descriptor/repformers.py (2)

compute_input_stats (374-412)

get_stats (414-420)

deepmd/pt/model/descriptor/dpa1.py (1)

compute_input_stats (412-433)

deepmd/pt/model/descriptor/se_atten.py (2)

compute_input_stats (361-402)

get_stats (404-410)

deepmd/dpmodel/descriptor/repflows.py (1)

get_stats (393-399)

deepmd/dpmodel/descriptor/se_t_tebd.py (4)

deepmd/dpmodel/utils/env_mat_stat.py (2)

EnvMatStatSe (65-253)

get_hash (200-219)

deepmd/utils/env_mat_stat.py (2)

StatItem (26-91)

load_or_compute_stats (162-181)

deepmd/utils/path.py (1)

DPPath (27-157)

deepmd/dpmodel/descriptor/descriptor.py (1)

get_stats (106-108)

🪛 GitHub Check: CodeQL

source/tests/consistent/descriptor/test_se_e2_a.py

[notice] 347-347: Unused local variable
Variable type_one_side is not used.

[notice] 348-348: Unused local variable
Variable excluded_types is not used.

[notice] 349-349: Unused local variable
Variable precision is not used.

[notice] 350-350: Unused local variable
Variable env_protection is not used.

[notice] 357-357: Unused local variable
Variable resnet_dt is not used.

[notice] 358-358: Unused local variable
Variable type_one_side is not used.

[notice] 359-359: Unused local variable
Variable excluded_types is not used.

[notice] 360-360: Unused local variable
Variable precision is not used.

[notice] 361-361: Unused local variable
Variable env_protection is not used.

[notice] 383-383: Unused local variable
Variable resnet_dt is not used.

[notice] 384-384: Unused local variable
Variable type_one_side is not used.

[notice] 385-385: Unused local variable
Variable excluded_types is not used.

[notice] 386-386: Unused local variable
Variable precision is not used.

[notice] 387-387: Unused local variable
Variable env_protection is not used.

🪛 Ruff (0.8.2)

deepmd/dpmodel/utils/env_mat_stat.py

122-122: Local variable natoms is assigned to but never used

Remove assignment to unused variable natoms

(F841)

deepmd/dpmodel/descriptor/dpa1.py

889-893: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)

deepmd/dpmodel/descriptor/repflows.py

378-382: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)

deepmd/dpmodel/descriptor/repformers.py

399-403: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)

deepmd/dpmodel/descriptor/se_t_tebd.py

675-679: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)

⏰ Context from checks skipped due to timeout of 90000ms (29)

GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Test Python (6, 3.12)
GitHub Check: Test Python (6, 3.9)
GitHub Check: Test Python (5, 3.12)
GitHub Check: Test Python (5, 3.9)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Test Python (4, 3.9)
GitHub Check: Test Python (3, 3.12)
GitHub Check: Test Python (3, 3.9)
GitHub Check: Test Python (2, 3.12)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Test Python (1, 3.12)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Test C++ (false)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Test C++ (true)
GitHub Check: Analyze (python)
GitHub Check: Analyze (c-cpp)

🔇 Additional comments (15)

deepmd/dpmodel/descriptor/repflows.py (2)

25-27: Good addition of necessary imports for statistics functionality.

The added imports for EnvMatStatSe and StatItem support the new statistics computation functionality being implemented.

Also applies to: 38-40

393-399: Added proper statistics retrieval with error handling.

The get_stats method correctly returns the stored statistics and raises an appropriate error when statistics haven't been computed yet.

deepmd/dpmodel/descriptor/repformers.py (2)

25-27: Good addition of necessary imports for statistics functionality.

The added imports for EnvMatStatSe and StatItem support the new statistics computation functionality being implemented.

Also applies to: 39-41

414-420: Added proper statistics retrieval with error handling.

The get_stats method correctly returns the stored statistics and raises an appropriate error when statistics haven't been computed yet.

deepmd/dpmodel/descriptor/dpa1.py (3)

865-902: LGTM! Well-implemented statistics computation for descriptors.

The implementation of compute_input_stats correctly leverages the new EnvMatStatSe utility to compute or load cached statistics for the descriptor. The pattern matches other descriptor implementations, ensuring consistency across the codebase.

🧰 Tools

🪛 Ruff (0.8.2)

889-893: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)

904-910: LGTM! Properly implemented stats retrieval.

The get_stats method correctly checks if statistics have been computed before returning them, with an appropriate error message if not available.

416-437: LGTM! Effective delegation to internal descriptor block.

The implementation correctly delegates to the internal se_atten block's compute_input_stats method, maintaining a clean separation of concerns.

deepmd/dpmodel/descriptor/se_t_tebd.py (3)

651-688: LGTM! Proper implementation of statistics computation.

The implementation correctly uses EnvMatStatSe to compute or load cached statistics for the descriptor. The statistics are properly stored and the mean/stddev arrays are correctly updated with appropriate type consistency.

🧰 Tools

🪛 Ruff (0.8.2)

675-679: Use ternary operator sampled = merged() if callable(merged) else merged instead of if-else-block

(SIM108)

690-696: LGTM! Properly implemented stats retrieval.

The get_stats method correctly checks if statistics have been computed before returning them, with an appropriate error message if not available.

254-275: LGTM! Good delegation pattern.

The implementation correctly delegates to the internal se_ttebd block's compute_input_stats method, maintaining a clean separation of concerns.

source/tests/consistent/descriptor/test_se_e2_a.py (1)

312-608: LGTM! Well-designed test for statistics computation.

This test class effectively exercises the new compute_input_stats functionality across multiple backends (DP, PyTorch, JAX, Paddle, Array API Strict). The test:

Sets up descriptors with appropriate parameters

Correctly formats input data for each backend

Calls compute_input_stats with the formatted data

Verifies that descriptors work correctly after statistics computation

The static analysis warnings about unused variables can be safely ignored, as they're part of a consistent parameter unpacking pattern used throughout the test classes.

🧰 Tools

🪛 GitHub Check: CodeQL

[notice] 347-347: Unused local variable
Variable type_one_side is not used.

[notice] 348-348: Unused local variable
Variable excluded_types is not used.

[notice] 349-349: Unused local variable
Variable precision is not used.

[notice] 350-350: Unused local variable
Variable env_protection is not used.

[notice] 357-357: Unused local variable
Variable resnet_dt is not used.

[notice] 358-358: Unused local variable
Variable type_one_side is not used.

[notice] 359-359: Unused local variable
Variable excluded_types is not used.

[notice] 360-360: Unused local variable
Variable precision is not used.

[notice] 361-361: Unused local variable
Variable env_protection is not used.

[notice] 383-383: Unused local variable
Variable resnet_dt is not used.

[notice] 384-384: Unused local variable
Variable type_one_side is not used.

[notice] 385-385: Unused local variable
Variable excluded_types is not used.

[notice] 386-386: Unused local variable
Variable precision is not used.

[notice] 387-387: Unused local variable
Variable env_protection is not used.

deepmd/dpmodel/utils/env_mat_stat.py (4)

40-62: LGTM! Well-implemented statistics computation base class.

The EnvMatStat base class effectively computes statistics (count, sum, squared sum) from environment matrices, with proper array API compatibility for multi-backend support.

65-199: LGTM! Thorough implementation of environment matrix statistics.

The EnvMatStatSe class provides a complete implementation for processing descriptor data:

Correctly identifies radial-only vs. full descriptors

Properly handles coordinates, atom types, and neighbor lists

Applies appropriate exclusion masks

Efficiently processes statistics per atom type

Supports array API compatibility for multiple backends

The implementation integrates well with the descriptor interface, using standard methods like get_rcut(), get_sel(), etc.

🧰 Tools

🪛 Ruff (0.8.2)

122-122: Local variable natoms is assigned to but never used

Remove assignment to unused variable natoms

(F841)

200-219: LGTM! Robust hashing for statistics caching.

The get_hash method creates a deterministic hash based on key descriptor properties, ensuring consistent cache handling across runs.

221-253: LGTM! Well-formatted statistics output.

The __call__ method correctly transforms raw statistics into the format expected by descriptors, handling both radial and angular components appropriately.

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn>

codecov · 2025-05-09T08:03:39Z

Codecov Report

❌ Patch coverage is 44.24779% with 126 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.69%. Comparing base (17da9d2) to head (abe5a39).
⚠️ Report is 78 commits behind head on devel.

Files with missing lines	Patch %	Lines
deepmd/dpmodel/descriptor/dpa1.py	13.63%	19 Missing ⚠️
deepmd/dpmodel/descriptor/se_t_tebd.py	13.63%	19 Missing ⚠️
deepmd/dpmodel/descriptor/repflows.py	14.28%	18 Missing ⚠️
deepmd/dpmodel/descriptor/repformers.py	14.28%	18 Missing ⚠️
deepmd/dpmodel/descriptor/se_r.py	11.76%	15 Missing ⚠️
deepmd/dpmodel/descriptor/se_t.py	11.76%	15 Missing ⚠️
deepmd/dpmodel/utils/env_mat_stat.py	85.71%	11 Missing ⚠️
deepmd/dpmodel/descriptor/dpa2.py	0.00%	5 Missing ⚠️
deepmd/dpmodel/descriptor/dpa3.py	25.00%	3 Missing ⚠️
deepmd/dpmodel/descriptor/se_e2_a.py	82.35%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##            devel    #4729      +/-   ##
==========================================
- Coverage   84.80%   84.69%   -0.12%     
==========================================
  Files         696      697       +1     
  Lines       67269    67472     +203     
  Branches     3541     3540       -1     
==========================================
+ Hits        57050    57145      +95     
- Misses       9087     9195     +108     
  Partials     1132     1132

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

wanghan-iapcm

I feel we should refactorize all the comput_input_stat, as almost all the classes implement the method in exactly the same way.

Shall we consider firstly refactorize the pt backend and then implement the python backend?

njzjz · 2025-05-12T03:05:33Z

Shall we consider firstly refactorize the pt backend and then implement the python backend?

My current task is to implement #4428 (#4430) as soon as possible.

feat(array-api): env mat stat

f02db7f

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn>

njzjz requested review from Copilot, iProzd and wanghan-iapcm and removed request for Copilot May 8, 2025 18:36

github-actions bot added the Python label May 8, 2025

github-advanced-security bot found potential problems May 8, 2025

View reviewed changes

coderabbitai bot reviewed May 8, 2025

View reviewed changes

deepmd/dpmodel/utils/env_mat_stat.py Show resolved Hide resolved

deepmd/dpmodel/descriptor/se_t_tebd.py Outdated Show resolved Hide resolved

njzjz added 2 commits May 9, 2025 04:06

fix typos

f9d18d6

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn>

fix codeql

959324a

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn>

coderabbitai bot reviewed May 8, 2025

View reviewed changes

deepmd/dpmodel/descriptor/dpa1.py Show resolved Hide resolved

coderabbitai bot reviewed May 8, 2025

View reviewed changes

drop @variables

abe5a39

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn>

wanghan-iapcm reviewed May 12, 2025

View reviewed changes

wanghan-iapcm approved these changes May 12, 2025

View reviewed changes

iProzd approved these changes May 12, 2025

View reviewed changes

njzjz added this pull request to the merge queue May 12, 2025

njzjz mentioned this pull request May 12, 2025

refactorize compute_input_stat #4732

Open

Merged via the queue into deepmodeling:devel with commit a31b886 May 12, 2025
60 checks passed

njzjz deleted the dpmodel_env_mat_stat branch May 12, 2025 08:30

Copilot AI mentioned this pull request Aug 28, 2025

feat: refactor compute_input_stats across all backends to eliminate code duplication #4932

Draft

This was referenced Sep 24, 2025

pd(feat): support python inference with DP class #4987

Merged

feat(pt): add compression support for se_e3_tebd #4992

Merged

coderabbitai bot mentioned this pull request Jan 12, 2026

style: enable B905 to prevent issues with zip #5136

Merged

feat(array-api): env mat stat #4729

feat(array-api): env mat stat #4729

Uh oh!

Conversation

njzjz commented May 8, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

Uh oh!

Check notice

Check notice

Check notice

Check notice

Check notice

Check notice

Check notice

Check notice

Check notice

coderabbitai bot commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wanghan-iapcm left a comment

Choose a reason for hiding this comment

Uh oh!

njzjz commented May 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

njzjz commented May 8, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented May 8, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented May 9, 2025 •

edited

Loading