feat: handle masked forces in test #4893

caic99 · 2025-08-15T08:17:00Z

Summary by CodeRabbit

New Features
- Added per-atom weighting for force evaluation: computes and reports weighted MAE/RMSE alongside unweighted metrics, includes weighted metrics in system-average summaries, logs weighted force metrics, and safely handles zero-weight cases. Also propagates the per-atom weight field into reporting.
Tests
- Added end-to-end tests validating weighted vs unweighted force MAE/RMSE and verifying evaluator outputs when using per-atom weight masks.

for more information, see https://pre-commit.ci

codecov · 2025-08-15T09:13:55Z

Codecov Report

❌ Patch coverage is 90.47619% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.30%. Comparing base (accc331) to head (7e1f36d).
⚠️ Report is 86 commits behind head on devel.

Files with missing lines	Patch %	Lines
deepmd/entrypoints/test.py	90.47%	2 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##            devel    #4893   +/-   ##
=======================================
  Coverage   84.29%   84.30%           
=======================================
  Files         702      703    +1     
  Lines       68664    68746   +82     
  Branches     3572     3573    +1     
=======================================
+ Hits        57883    57953   +70     
- Misses       9641     9653   +12     
  Partials     1140     1140

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai · 2025-08-18T02:35:07Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds per-atom weighting support for force error metrics in deepmd/entrypoints/test.py via an atom_pref field and find_atom_pref flag, returns weighted MAE/RMSE and weight sums alongside unweighted metrics, and introduces an end-to-end PyTorch test TestDPTestForceWeight validating weighted vs unweighted force errors.

Changes

Cohort / File(s)	Summary of Changes
Weighted force metrics in dp_test_ener `deepmd/entrypoints/test.py`	Declares `atom_pref` data field, reads `find_atom_pref` from test data, computes weighted force residuals when enabled (mae_fw, rmse_fw) with weight_sum as effective sample size, records unweighted metrics using `size_f`, updates logging and averaged printing to include weighted metrics when present, and adds weighted metrics into returned dict.
Force-weight end-to-end test `source/tests/pt/test_dp_test.py`	Adds `TestDPTestForceWeight` test class that prepares a temporary weighted dataset (adds `atom_pref` mask and modifies forces), trains/scripts a model, runs evaluation via `dp_test_ener`/`DeepEval`, independently computes weighted and unweighted MAE/RMSE, asserts equality with reported `mae_f`/`rmse_f` and `mae_fw`/`rmse_fw`, and cleans up artifacts.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Entrypoint as dp_test_ener
  participant TestData as test_data
  participant Metrics as MetricCalc
  Note over Entrypoint,TestData: Read forces and optional atom_pref
  Entrypoint->>TestData: get forces, n_frames, find_atom_pref, atom_pref
  Entrypoint->>Metrics: compute diff_f = pred_f - test_f
  alt find_atom_pref == 1
    Metrics->>Metrics: apply per-atom weights (atom_pref)
    Metrics-->>Entrypoint: mae_fw, rmse_fw, weight_sum
    Metrics-->>Entrypoint: mae_f, rmse_f, size_f
  else
    Metrics-->>Entrypoint: mae_f, rmse_f, size_f (unweighted)
  end
  Entrypoint-->>Entrypoint: populate result dict (include weighted fields if present)

sequenceDiagram
  autonumber
  participant TestCase as TestDPTestForceWeight
  participant Prep as _prepare_weighted_system
  participant Trainer as Trainer/Script
  participant Eval as DeepEval / dp_test_ener
  participant Loader as DeepmdData

  TestCase->>Prep: copy dataset, modify forces, add `atom_pref` mask
  TestCase->>Trainer: build/train & script model -> save .pth
  TestCase->>Loader: load prepared test dataset
  TestCase->>Eval: run scripted model and dp_test_ener
  Eval-->>TestCase: returns mae_f, rmse_f, mae_fw, rmse_fw (and sizes)
  TestCase->>TestCase: compute independent weighted/unweighted metrics
  TestCase-->>TestCase: assert reported == computed metrics

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

(fix)Make the weighted avarange fit for all kinds of systems #4593 — Related changes to test_ener error reporting and averaging; likely overlaps in weighted/unweighted metric handling.

Suggested reviewers

iProzd
njzjz

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

deepmd/entrypoints/test.py (1)
421-435: Weighted force MAE/RMSE look correct; add a shape guard for robustness

The weighting logic (sum of abs residuals / sum of weights and sqrt(weighted MSE)) is sound, and size_f=sum(weights) makes the system-level aggregation consistent. To tolerate atom_pref provided as (nframes, natoms) without depending on repeat=3 at load time, add a defensive broadcast/validation.
-    if find_atom_pref == 1:
-        atom_pref = test_data["atom_pref"][:numb_test]
-        diff_f = diff_f * atom_pref
+    if find_atom_pref == 1:
+        atom_pref = test_data["atom_pref"][:numb_test]
+        # Accept both (nframes, natoms) and (nframes, natoms*3)
+        if atom_pref.shape != diff_f.shape:
+            if atom_pref.shape[1] * 3 == diff_f.shape[1]:
+                atom_pref = np.repeat(atom_pref, 3, axis=1)
+            else:
+                raise ValueError(
+                    f"atom_pref shape {atom_pref.shape} is incompatible with force shape {diff_f.shape}"
+                )
+        diff_f = diff_f * atom_pref
         size_f = np.sum(atom_pref)
         if size_f > 0:
             mae_f = np.sum(np.abs(diff_f)) / size_f
             rmse_f = np.sqrt(np.sum(diff_f * diff_f) / size_f)
         else:
             mae_f = 0.0
             rmse_f = 0.0
If desired, we can also factor this into small helpers (weighted_mae/weighted_rmse) to keep this block concise.
source/tests/pt/test_dp_test.py (2)
168-181: Mask preparation is correct; tiny robustness note about copytree target

Copying the sample system into a mkdtemp dir and writing atom_pref.npy as (nframes, natoms) is consistent with repeat=3 in the loader. Setting the last atom’s weight to 0 validates masking well. One minor robustness nit: shutil.copytree(..., dirs_exist_ok=True) is fine here, but if src ever contains symlinks or permissions need preserving, consider copy_function/copy2. Not required for this test data.

182-230: Good end-to-end assertions; add tolerances and verify the reported sample size

The comparison path is sound: run dp_test_ener, independently compute the masked MAE/RMSE from dp.eval outputs, and compare. To avoid flaky failures from float32 arithmetic and to verify the new size_f behavior, consider:

Use explicit tolerances in assert_allclose.

Assert that the returned sizes equal the mask sum.
-        np.testing.assert_allclose(err["mae_f"][0], mae_expected)
-        np.testing.assert_allclose(err["rmse_f"][0], rmse_expected)
+        np.testing.assert_allclose(err["mae_f"][0], mae_expected, rtol=1e-6, atol=1e-8)
+        np.testing.assert_allclose(err["rmse_f"][0], rmse_expected, rtol=1e-6, atol=1e-8)
+        # Also verify the effective sample size matches the mask sum
+        np.testing.assert_equal(err["mae_f"][1], denom)
+        np.testing.assert_equal(err["rmse_f"][1], denom)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between dc99ba2 and 13d43e2.

📒 Files selected for processing (2)

deepmd/entrypoints/test.py (4 hunks)
source/tests/pt/test_dp_test.py (2 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

deepmd/entrypoints/test.py (4)

deepmd/utils/data_system.py (1)

add (342-395)

deepmd/utils/data.py (1)

add (136-189)

deepmd/pt/utils/stat.py (1)

rmse (525-526)

deepmd/dpmodel/output_def.py (1)

size (230-231)

🪛 Ruff (0.12.2)

source/tests/pt/test_dp_test.py

187-187: Use a context manager for opening files

(SIM115)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)

GitHub Check: Analyze (python)
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Analyze (c-cpp)
GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Test C++ (false)
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Test C++ (true)
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Test Python (5, 3.12)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Test Python (3, 3.9)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Test Python (6, 3.12)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Test Python (2, 3.12)
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Test Python (1, 3.12)
GitHub Check: Test Python (5, 3.9)
GitHub Check: Test Python (6, 3.9)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Test Python (3, 3.12)
GitHub Check: Test Python (4, 3.9)

🔇 Additional comments (6)

deepmd/entrypoints/test.py (3)

294-295: Correctly declares atom-level preference mask aligned with force components

Adding atom_pref as an atomic item with repeat=3 matches the flattened (natoms*3) force layout. Keeping must=False ensures backward compatibility (feature enabled only when the file is present).

317-317: Presence flag retrieval is consistent with existing pattern

Using test_data.get("find_atom_pref") mirrors the existing find_* flags and safely disables weighting when absent.

470-471: Return sizes reflect the effective weighted sample size

Reporting (value, size_f) for mae_f/rmse_f is the right choice for weighted aggregation across systems. No change needed.

source/tests/pt/test_dp_test.py (3)

18-21: Imports for dp_test_ener/DeepEval/DeepmdData are appropriate

These imports are minimal and scoped to the new test’s needs.

Also applies to: 28-30

150-167: Solid setup for masked-force test system

Bootstrapping a tiny training config and pointing both training/validation to the prepared system keeps the test self-contained. No issues spotted.

231-234: Clean teardown

Delegating to the base teardown and removing the temp system directory prevents test artifacts from leaking. Looks good.

anyangml

LGTM

Copilot

Pull Request Overview

This PR adds support for masked forces in testing by implementing per-atom weighting for force error metrics. When atom preferences (masks) are provided, the evaluation computes weighted MAE/RMSE that only consider selected atoms, with graceful handling of zero-weight cases.

Key changes:

Modified force error calculation to apply atom-level masking when atom_pref data is available
Added comprehensive test coverage for masked force evaluation scenarios
Updated error metric reporting to reflect the actual number of atoms considered in calculations

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
deepmd/entrypoints/test.py	Implements masked force error calculation with weighted MAE/RMSE computation
source/tests/pt/test_dp_test.py	Adds end-to-end test validating masked force metrics against expected calculations

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

deepmd/entrypoints/test.py

source/tests/pt/test_dp_test.py

njzjz

This PR assumes the atom_pref is the mask, but it may not as it is not limited to 0 and 1. One may set it to something like 1 1 1 1 5 1 1 1 to let some atom has larger weights.

deepmd/entrypoints/test.py

njzjz · 2025-08-23T05:36:01Z

@caic99 In short, I suggest showing both the error with and without the atom_pref.

wanghan-iapcm

atom force prefactor is not designed as a mask, it assigns weights to the atomic force loss terms.

…into codex/update-test_ener-to-handle-masked-forces

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

source/tests/pt/test_dp_test.py (1)

168-181: Define a named constant for the force offset instead of magic number

This improves readability and mirrors prior feedback.

Apply:

 class TestDPTestForceWeight(DPTest, unittest.TestCase):
     def setUp(self) -> None:
         self.detail_file = "test_dp_test_force_weight_detail"
@@
-    def _prepare_weighted_system(self) -> str:
+    def _prepare_weighted_system(self) -> str:
         src = Path(__file__).parent / "water/data/single"
         tmp_dir = tempfile.mkdtemp()
         shutil.copytree(src, tmp_dir, dirs_exist_ok=True)
         set_dir = Path(tmp_dir) / "set.000"
         forces = np.load(set_dir / "force.npy")
-        forces[0, -3:] += 10.0
+        # bump the last atom only
+        self.FORCE_OFFSET = getattr(self, "FORCE_OFFSET", 10.0)
+        forces[0, -3:] += self.FORCE_OFFSET
         np.save(set_dir / "force.npy", forces)

🧹 Nitpick comments (4)

deepmd/entrypoints/test.py (2)
294-295: Adding atom_pref looks good; consider avoiding 3x memory via compute-time broadcasting

Registering atom_pref as atomic with repeat=3 works, but it inflates this array 3x in memory and I/O. Since the per-atom weight is axis-invariant across x/y/z, you can keep atom_pref as shape [nframes, natoms] and broadcast it at compute time. This reduces memory and simplifies data loading while preserving semantics. See the follow-up diff in the weighting block for a safe broadcast that handles both shapes (with and without repeat), so you can optionally drop repeat=3 here.

Example (paired with the weighting-block change below):
-    data.add("atom_pref", 1, atomic=True, must=False, high_prec=False, repeat=3)
+    # Keep per-atom weights; broadcast to 3 components at compute-time
+    data.add("atom_pref", 1, atomic=True, must=False, high_prec=False)
620-623: Nice: system average now includes weighted force metrics

The conditional print of weighted force MAE/RMSE when present matches reviewer feedback (“show both with and without atom_pref”). Consider also printing the effective sample size (sum of weights) for transparency.

Example:
         if "rmse_fw" in avg:
             log.info(f"Force weighted MAE : {avg['mae_fw']:e} eV/A")
             log.info(f"Force weighted RMSE: {avg['rmse_fw']:e} eV/A")
+            if "mae_fw__size" in avg:
+                log.info(f"Force weighted effN: {avg['mae_fw__size']:g}")
Note: This needs weighted_average to optionally retain sizes in the aggregated output (e.g., by exposing them under a “__size” suffix).
source/tests/pt/test_dp_test.py (2)
217-219: Don’t hardcode mixed_type=False; derive from data to avoid surprises

Future tests may use mixed-type systems; pass the actual flag from DeepmdData.

Apply:
-            mixed_type=False,
+            mixed_type=data.mixed_type,
220-233: Strengthen test: also assert sizes (denominators) used in metrics

Verifies that dp_test_ener reports effective counts consistently for both weighted and unweighted metrics.

Apply:
         mae_unweighted = np.sum(np.abs(diff)) / diff.size
         rmse_unweighted = np.sqrt(np.sum(diff * diff) / diff.size)
         np.testing.assert_allclose(err["mae_f"][0], mae_unweighted)
         np.testing.assert_allclose(err["rmse_f"][0], rmse_unweighted)
+        # Also check the denominators
+        np.testing.assert_equal(err["mae_f"][1], diff.size)
+        np.testing.assert_equal(err["rmse_f"][1], diff.size)
         np.testing.assert_allclose(err["mae_fw"][0], mae_expected)
         np.testing.assert_allclose(err["rmse_fw"][0], rmse_expected)
+        np.testing.assert_allclose(err["mae_fw"][1], denom)
+        np.testing.assert_allclose(err["rmse_fw"][1], denom)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 13d43e2 and 2a4dedc.

📒 Files selected for processing (2)

deepmd/entrypoints/test.py (5 hunks)
source/tests/pt/test_dp_test.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

source/tests/pt/test_dp_test.py (3)

deepmd/entrypoints/test.py (2)

test (58-201)

test_ener (257-602)

deepmd/infer/deep_eval.py (4)

DeepEval (345-687)

get_type_map (156-157)

get_type_map (411-413)

eval (100-145)

deepmd/utils/data.py (1)

DeepmdData (24-739)

deepmd/entrypoints/test.py (2)

deepmd/utils/data_system.py (1)

add (342-395)

deepmd/utils/data.py (1)

add (136-189)

🪛 Ruff (0.12.2)

source/tests/pt/test_dp_test.py

187-187: Use a context manager for opening files

(SIM115)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (25)

GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Test Python (5, 3.9)
GitHub Check: Test Python (6, 3.9)
GitHub Check: Test Python (6, 3.12)
GitHub Check: Test Python (5, 3.12)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Test Python (2, 3.12)
GitHub Check: Test Python (4, 3.9)
GitHub Check: Test Python (3, 3.12)
GitHub Check: Test Python (1, 3.12)
GitHub Check: Test Python (3, 3.9)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Test C++ (false)
GitHub Check: Test C++ (true)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Analyze (c-cpp)

deepmd/entrypoints/test.py

source/tests/pt/test_dp_test.py

…into codex/update-test_ener-to-handle-masked-forces

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

source/tests/pt/test_dp_test.py (2)
168-182: Replace magic numbers and make intent explicit in weighted system prep

Use the constants suggested above; avoids magic numbers and clarifies which atoms are emphasized or ignored. Also fine to keep dirs_exist_ok=True since mkdtemp() pre-creates the directory.
         forces = np.load(set_dir / "force.npy")
-        forces[0, :3] += 1.0
-        forces[0, -3:] += 10.0
+        forces[0, :3] += self.FORCE_OFFSET_FIRST
+        forces[0, -3:] += self.FORCE_OFFSET_LAST
         np.save(set_dir / "force.npy", forces)
         natoms = forces.shape[1] // 3
         atom_pref = np.ones((forces.shape[0], natoms), dtype=forces.dtype)
-        atom_pref[:, 0] = 2.0
-        atom_pref[:, -1] = 0.0
+        atom_pref[:, 0] = self.FIRST_ATOM_WEIGHT
+        atom_pref[:, -1] = self.LAST_ATOM_WEIGHT
         np.save(set_dir / "atom_pref.npy", atom_pref)
207-214: Slice coord to 1 frame before reshape; current code flattens all frames

Reshape without slicing can consume all frames. Slice [:1] to match box and numb_test=1.
-        coord = test_data["coord"].reshape([1, -1])
+        coord = test_data["coord"][:1].reshape([1, -1])

🧹 Nitpick comments (3)

source/tests/pt/test_dp_test.py (3)

150-167: Good test scaffolding; consider defining explicit constants for readability

The setup is clear and mirrors existing patterns. To improve clarity and future maintenance, define the force offsets and atom weights as class-level constants and reuse them in _prepare_weighted_system. This also addresses prior nitpicks about magic numbers.

 class TestDPTestForceWeight(DPTest, unittest.TestCase):
+    # Test constants for reproducibility and readability
+    FORCE_OFFSET_FIRST = 1.0
+    FORCE_OFFSET_LAST = 10.0
+    FIRST_ATOM_WEIGHT = 2.0
+    LAST_ATOM_WEIGHT = 0.0

189-191: Ensure temp model file is always closed and cleaned up, even on failures

Close the NamedTemporaryFile (SIM115) and move cleanup to tearDown so files aren’t leaked on assertion failures. Store the path on self.

-        tmp_model = tempfile.NamedTemporaryFile(delete=False, suffix=".pth")
-        torch.jit.save(model, tmp_model.name)
+        tmp_model = tempfile.NamedTemporaryFile(delete=False, suffix=".pth")
+        torch.jit.save(model, tmp_model.name)
+        tmp_model.close()  # ensure the handle is closed before other consumers use the path
+        self._tmp_model_path = tmp_model.name
@@
-        os.unlink(tmp_model.name)
+        # cleanup moved to tearDown
@@
     def tearDown(self) -> None:
         super().tearDown()
         shutil.rmtree(self.system_dir)
+        # Ensure model file is removed even if the test fails early
+        if hasattr(self, "_tmp_model_path") and os.path.exists(self._tmp_model_path):
+            os.unlink(self._tmp_model_path)

Also applies to: 237-237, 239-241

226-236: Strengthen assertions; verify denominators and guard against zero sum

Validate the effective sample sizes returned by dp_test_ener and add a guard to avoid division by zero if all weights are zero (future-proofing). This also aligns with the reviewer suggestion to “show both the error with and without atom_pref” by making both paths explicit in assertions.

         masked_diff = diff[mask]
-        mae_unweighted = np.sum(np.abs(masked_diff)) / mask.sum()
-        rmse_unweighted = np.sqrt(np.sum(masked_diff * masked_diff) / mask.sum())
+        mae_unweighted = np.sum(np.abs(masked_diff)) / mask.sum()
+        rmse_unweighted = np.sqrt(np.sum(masked_diff * masked_diff) / mask.sum())
         denom = weight.sum()
+        assert denom > 0, "Weight sum is zero; cannot compute weighted errors."
         mae_weighted = np.sum(np.abs(diff) * weight) / denom
         rmse_weighted = np.sqrt(np.sum(diff * diff * weight) / denom)
-        np.testing.assert_allclose(err["mae_f"][0], mae_unweighted)
-        np.testing.assert_allclose(err["rmse_f"][0], rmse_unweighted)
-        np.testing.assert_allclose(err["mae_fw"][0], mae_weighted)
-        np.testing.assert_allclose(err["rmse_fw"][0], rmse_weighted)
+        # values and denominators (effective sample sizes)
+        np.testing.assert_equal(err["mae_f"][1], mask.sum())
+        np.testing.assert_equal(err["rmse_f"][1], mask.sum())
+        np.testing.assert_allclose(err["mae_f"][0], mae_unweighted)
+        np.testing.assert_allclose(err["rmse_f"][0], rmse_unweighted)
+        np.testing.assert_equal(err["mae_fw"][1], weight.sum())
+        np.testing.assert_equal(err["rmse_fw"][1], weight.sum())
+        np.testing.assert_allclose(err["mae_fw"][0], mae_weighted)
+        np.testing.assert_allclose(err["rmse_fw"][0], rmse_weighted)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2a4dedc and e74b4bd.

📒 Files selected for processing (2)

deepmd/entrypoints/test.py (5 hunks)
source/tests/pt/test_dp_test.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

deepmd/entrypoints/test.py

🧰 Additional context used

🧬 Code graph analysis (1)

source/tests/pt/test_dp_test.py (3)

deepmd/entrypoints/test.py (2)

test (58-201)

test_ener (257-612)

deepmd/infer/deep_eval.py (4)

DeepEval (345-687)

get_type_map (156-157)

get_type_map (411-413)

eval (100-145)

deepmd/utils/data.py (1)

DeepmdData (24-739)

🪛 Ruff (0.12.2)

source/tests/pt/test_dp_test.py

189-189: Use a context manager for opening files

(SIM115)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)

GitHub Check: Analyze (python)
GitHub Check: Analyze (c-cpp)
GitHub Check: Test Python (5, 3.9)
GitHub Check: Test Python (6, 3.12)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Test Python (3, 3.12)
GitHub Check: Test Python (6, 3.9)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Test Python (5, 3.12)
GitHub Check: Test Python (1, 3.12)
GitHub Check: Test Python (2, 3.12)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Test Python (4, 3.9)
GitHub Check: Test Python (3, 3.9)
GitHub Check: Test C++ (true)
GitHub Check: Test C++ (false)
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build C++ (clang, clang)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Build C++ (rocm, rocm)

🔇 Additional comments (2)

source/tests/pt/test_dp_test.py (2)

18-21: LGTM: New imports are correct and used

dp_test_ener and DeepEval are both used below. No issues.

28-30: LGTM: DeepmdData import

The import is necessary for the explicit data loading in the new test.

source/tests/pt/test_dp_test.py

deepmd/entrypoints/test.py

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

source/tests/pt/test_dp_test.py (2)

168-183: Replace magic numbers with named constants for clarity

Use named constants for the force offsets and atom weights to improve readability and intent.

Apply within this block:

-        forces[0, :3] += 1.0
-        forces[0, -3:] += 10.0
+        forces[0, :3] += self.FORCE_OFFSET_HEAD
+        forces[0, -3:] += self.FORCE_OFFSET_TAIL
-        atom_pref[:, 0] = 2.0
-        atom_pref[:, -1] = 0.0
+        atom_pref[:, 0] = self.HEAD_WEIGHT
+        atom_pref[:, -1] = self.TAIL_WEIGHT

Add these class attributes near the top of the class:

# class-level constants (add near class definition)
FORCE_OFFSET_HEAD = 1.0
FORCE_OFFSET_TAIL = 10.0
HEAD_WEIGHT = 2.0
TAIL_WEIGHT = 0.0

207-222: Fix two bugs: frame slicing and dict access for eval output

coord must be sliced to the first frame before reshape; current code flattens all frames.
DeepEval.eval returns a dict; ret[1] will KeyError. Use ret["force"].

-        coord = test_data["coord"].reshape([1, -1])
+        coord = test_data["coord"][:1].reshape([1, -1])
@@
-        ret = dp.eval(
+        ret = dp.eval(
             coord,
             box,
             atype,
             fparam=None,
             aparam=None,
             atomic=False,
             efield=None,
             mixed_type=False,
             spin=None,
         )
-        force_pred = ret[1].reshape([1, -1])
+        force_pred = ret["force"].reshape([1, -1])

🧹 Nitpick comments (3)

source/tests/pt/test_dp_test.py (3)

189-191: Use a context manager for the temp model file (ruff SIM115) and reuse the path

Prevents resource leaks and aligns with tooling suggestions.

-        tmp_model = tempfile.NamedTemporaryFile(delete=False, suffix=".pth")
-        torch.jit.save(model, tmp_model.name)
-        dp = DeepEval(tmp_model.name)
+        with tempfile.NamedTemporaryFile(delete=False, suffix=".pth") as tmp_model:
+            torch.jit.save(model, tmp_model.name)
+            tmp_model_path = tmp_model.name
+        dp = DeepEval(tmp_model_path)
@@
-        os.unlink(tmp_model.name)
+        os.unlink(tmp_model_path)

Also applies to: 235-235

226-234: Good: validates both unweighted and weighted MAE/RMSE (addresses reviewer ask)

Asserting both metrics matches the behavior with and without atom_pref.

Optionally add tolerances to avoid flakiness from dtype/ordering differences:

-        np.testing.assert_allclose(err["mae_f"][0], mae_unweighted)
-        np.testing.assert_allclose(err["rmse_f"][0], rmse_unweighted)
-        np.testing.assert_allclose(err["mae_fw"][0], mae_weighted)
-        np.testing.assert_allclose(err["rmse_fw"][0], rmse_weighted)
+        np.testing.assert_allclose(err["mae_f"][0],  mae_unweighted,  rtol=1e-6, atol=1e-8)
+        np.testing.assert_allclose(err["rmse_f"][0], rmse_unweighted, rtol=1e-6, atol=1e-8)
+        np.testing.assert_allclose(err["mae_fw"][0], mae_weighted,    rtol=1e-6, atol=1e-8)
+        np.testing.assert_allclose(err["rmse_fw"][0], rmse_weighted,  rtol=1e-6, atol=1e-8)

237-240: Make teardown more robust if setup fails mid-way

Guard against missing directory to avoid errors on teardown in partial failures.

-        shutil.rmtree(self.system_dir)
+        if hasattr(self, "system_dir") and os.path.isdir(self.system_dir):
+            shutil.rmtree(self.system_dir, ignore_errors=True)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e74b4bd and 7e1f36d.

📒 Files selected for processing (2)

deepmd/entrypoints/test.py (5 hunks)
source/tests/pt/test_dp_test.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

deepmd/entrypoints/test.py

🧰 Additional context used

🧬 Code graph analysis (1)

source/tests/pt/test_dp_test.py (3)

deepmd/entrypoints/test.py (2)

test (58-201)

test_ener (257-602)

deepmd/infer/deep_eval.py (4)

DeepEval (345-687)

get_type_map (156-157)

get_type_map (411-413)

eval (100-145)

deepmd/utils/data.py (1)

DeepmdData (24-739)

🪛 Ruff (0.12.2)

source/tests/pt/test_dp_test.py

189-189: Use a context manager for opening files

(SIM115)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Test Python (3, 3.9)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Analyze (python)
GitHub Check: Analyze (c-cpp)

🔇 Additional comments (4)

source/tests/pt/test_dp_test.py (4)
18-21: LGTM: necessary imports for weighted-force testing

Importing test_ener and DeepEval here is appropriate for the new weighted-force test flow.

28-30: LGTM: DeepmdData import is correct

Needed for constructing the test dataset with atom_pref.

150-167: Good test fixture setup

Solid setup: isolates a temporary weighted system, forces single-step training, and writes a dedicated input JSON. No issues.

191-191: DeepEval constructor signature confirmed

Verified in deepmd/infer/deep_eval.py that the DeepEval.__init__ method is defined as:
def __init__(self, model_file: str):
    …
and does not require an output_def argument . The instantiation
dp = DeepEval(tmp_model.name)
matches the current signature, so no changes are needed.

## Summary by CodeRabbit - New Features - Added per-atom weighting for force evaluation: computes and reports weighted MAE/RMSE alongside unweighted metrics, includes weighted metrics in system-average summaries, logs weighted force metrics, and safely handles zero-weight cases. Also propagates the per-atom weight field into reporting. - Tests - Added end-to-end tests validating weighted vs unweighted force MAE/RMSE and verifying evaluator outputs when using per-atom weight masks.  --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

feat: handle masked forces in test

41fb1b9

caic99 requested a review from anyangml August 15, 2025 08:17

github-actions bot added the Python label Aug 15, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

3e1401d

for more information, see https://pre-commit.ci

caic99 marked this pull request as ready for review August 18, 2025 02:26

Copilot AI review requested due to automatic review settings August 18, 2025 02:26

Merge branch 'devel' into codex/update-test_ener-to-handle-masked-forces

13d43e2

This comment was marked as outdated.

Sign in to view

coderabbitai bot reviewed Aug 18, 2025

View reviewed changes

anyangml requested a review from iProzd August 18, 2025 03:00

anyangml approved these changes Aug 19, 2025

View reviewed changes

caic99 requested a review from Copilot August 19, 2025 05:42

Copilot AI reviewed Aug 19, 2025

View reviewed changes

deepmd/entrypoints/test.py Outdated Show resolved Hide resolved

source/tests/pt/test_dp_test.py Show resolved Hide resolved

njzjz requested changes Aug 19, 2025

View reviewed changes

caic99 commented Aug 19, 2025

View reviewed changes

deepmd/entrypoints/test.py Outdated Show resolved Hide resolved

caic99 requested a review from njzjz August 22, 2025 03:04

wanghan-iapcm requested changes Aug 25, 2025

View reviewed changes

caic99 added 2 commits August 26, 2025 11:20

handle atom_pref as weights in test entry point

fe9fef8

Merge branch 'codex/update-test_ener-to-handle-masked-forces-tkzu2y' …

66fc99e

…into codex/update-test_ener-to-handle-masked-forces

caic99 requested a review from anyangml August 26, 2025 06:16

[pre-commit.ci] auto fixes from pre-commit.com hooks

2a4dedc

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Aug 26, 2025

View reviewed changes

deepmd/entrypoints/test.py Show resolved Hide resolved

source/tests/pt/test_dp_test.py Show resolved Hide resolved

caic99 added 2 commits August 26, 2025 16:23

Handle atom_pref mask in force metrics

761374e

Merge branch 'codex/update-test_ener-to-handle-masked-forces-tkzu2y' …

e74b4bd

…into codex/update-test_ener-to-handle-masked-forces

coderabbitai bot reviewed Aug 26, 2025

View reviewed changes

source/tests/pt/test_dp_test.py Show resolved Hide resolved

njzjz reviewed Aug 26, 2025

View reviewed changes

deepmd/entrypoints/test.py Show resolved Hide resolved

not using atom_pref as mask when calculating raw force metrics

7e1f36d

coderabbitai bot reviewed Aug 27, 2025

View reviewed changes

caic99 requested a review from njzjz August 27, 2025 09:49

njzjz approved these changes Aug 27, 2025

View reviewed changes

anyangml approved these changes Aug 27, 2025

View reviewed changes

caic99 requested a review from wanghan-iapcm August 27, 2025 10:33

wanghan-iapcm approved these changes Aug 29, 2025

View reviewed changes

iProzd approved these changes Aug 29, 2025

View reviewed changes

iProzd added this pull request to the merge queue Aug 29, 2025

Merged via the queue into deepmodeling:devel with commit aa44098 Aug 29, 2025
60 checks passed

iProzd deleted the codex/update-test_ener-to-handle-masked-forces branch August 29, 2025 10:11

feat: handle masked forces in test #4893

feat: handle masked forces in test #4893

Uh oh!

Conversation

caic99 commented Aug 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

codecov bot commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as outdated.

Uh oh!

coderabbitai bot commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

anyangml left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

njzjz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

njzjz commented Aug 23, 2025

Uh oh!

wanghan-iapcm left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

caic99 commented Aug 15, 2025 •

edited by coderabbitai bot

Loading

codecov bot commented Aug 15, 2025 •

edited

Loading

coderabbitai bot commented Aug 18, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

njzjz left a comment •

edited

Loading