Skip to content

Conversation

@caic99
Copy link
Member

@caic99 caic99 commented Aug 15, 2025

Summary by CodeRabbit

  • New Features

    • Added per-atom weighting for force evaluation: computes and reports weighted MAE/RMSE alongside unweighted metrics, includes weighted metrics in system-average summaries, logs weighted force metrics, and safely handles zero-weight cases. Also propagates the per-atom weight field into reporting.
  • Tests

    • Added end-to-end tests validating weighted vs unweighted force MAE/RMSE and verifying evaluator outputs when using per-atom weight masks.

@caic99 caic99 requested a review from anyangml August 15, 2025 08:17
@codecov
Copy link

codecov bot commented Aug 15, 2025

Codecov Report

❌ Patch coverage is 90.47619% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.30%. Comparing base (accc331) to head (7e1f36d).
⚠️ Report is 86 commits behind head on devel.

Files with missing lines Patch % Lines
deepmd/entrypoints/test.py 90.47% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            devel    #4893   +/-   ##
=======================================
  Coverage   84.29%   84.30%           
=======================================
  Files         702      703    +1     
  Lines       68664    68746   +82     
  Branches     3572     3573    +1     
=======================================
+ Hits        57883    57953   +70     
- Misses       9641     9653   +12     
  Partials     1140     1140           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@caic99 caic99 marked this pull request as ready for review August 18, 2025 02:26
Copilot AI review requested due to automatic review settings August 18, 2025 02:26

This comment was marked as outdated.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 18, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds per-atom weighting support for force error metrics in deepmd/entrypoints/test.py via an atom_pref field and find_atom_pref flag, returns weighted MAE/RMSE and weight sums alongside unweighted metrics, and introduces an end-to-end PyTorch test TestDPTestForceWeight validating weighted vs unweighted force errors.

Changes

Cohort / File(s) Summary of Changes
Weighted force metrics in dp_test_ener
deepmd/entrypoints/test.py
Declares atom_pref data field, reads find_atom_pref from test data, computes weighted force residuals when enabled (mae_fw, rmse_fw) with weight_sum as effective sample size, records unweighted metrics using size_f, updates logging and averaged printing to include weighted metrics when present, and adds weighted metrics into returned dict.
Force-weight end-to-end test
source/tests/pt/test_dp_test.py
Adds TestDPTestForceWeight test class that prepares a temporary weighted dataset (adds atom_pref mask and modifies forces), trains/scripts a model, runs evaluation via dp_test_ener/DeepEval, independently computes weighted and unweighted MAE/RMSE, asserts equality with reported mae_f/rmse_f and mae_fw/rmse_fw, and cleans up artifacts.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Entrypoint as dp_test_ener
  participant TestData as test_data
  participant Metrics as MetricCalc
  Note over Entrypoint,TestData: Read forces and optional atom_pref
  Entrypoint->>TestData: get forces, n_frames, find_atom_pref, atom_pref
  Entrypoint->>Metrics: compute diff_f = pred_f - test_f
  alt find_atom_pref == 1
    Metrics->>Metrics: apply per-atom weights (atom_pref)
    Metrics-->>Entrypoint: mae_fw, rmse_fw, weight_sum
    Metrics-->>Entrypoint: mae_f, rmse_f, size_f
  else
    Metrics-->>Entrypoint: mae_f, rmse_f, size_f (unweighted)
  end
  Entrypoint-->>Entrypoint: populate result dict (include weighted fields if present)
Loading
sequenceDiagram
  autonumber
  participant TestCase as TestDPTestForceWeight
  participant Prep as _prepare_weighted_system
  participant Trainer as Trainer/Script
  participant Eval as DeepEval / dp_test_ener
  participant Loader as DeepmdData

  TestCase->>Prep: copy dataset, modify forces, add `atom_pref` mask
  TestCase->>Trainer: build/train & script model -> save .pth
  TestCase->>Loader: load prepared test dataset
  TestCase->>Eval: run scripted model and dp_test_ener
  Eval-->>TestCase: returns mae_f, rmse_f, mae_fw, rmse_fw (and sizes)
  TestCase->>TestCase: compute independent weighted/unweighted metrics
  TestCase-->>TestCase: assert reported == computed metrics
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • iProzd
  • njzjz

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
deepmd/entrypoints/test.py (1)

421-435: Weighted force MAE/RMSE look correct; add a shape guard for robustness

The weighting logic (sum of abs residuals / sum of weights and sqrt(weighted MSE)) is sound, and size_f=sum(weights) makes the system-level aggregation consistent. To tolerate atom_pref provided as (nframes, natoms) without depending on repeat=3 at load time, add a defensive broadcast/validation.

-    if find_atom_pref == 1:
-        atom_pref = test_data["atom_pref"][:numb_test]
-        diff_f = diff_f * atom_pref
+    if find_atom_pref == 1:
+        atom_pref = test_data["atom_pref"][:numb_test]
+        # Accept both (nframes, natoms) and (nframes, natoms*3)
+        if atom_pref.shape != diff_f.shape:
+            if atom_pref.shape[1] * 3 == diff_f.shape[1]:
+                atom_pref = np.repeat(atom_pref, 3, axis=1)
+            else:
+                raise ValueError(
+                    f"atom_pref shape {atom_pref.shape} is incompatible with force shape {diff_f.shape}"
+                )
+        diff_f = diff_f * atom_pref
         size_f = np.sum(atom_pref)
         if size_f > 0:
             mae_f = np.sum(np.abs(diff_f)) / size_f
             rmse_f = np.sqrt(np.sum(diff_f * diff_f) / size_f)
         else:
             mae_f = 0.0
             rmse_f = 0.0

If desired, we can also factor this into small helpers (weighted_mae/weighted_rmse) to keep this block concise.

source/tests/pt/test_dp_test.py (2)

168-181: Mask preparation is correct; tiny robustness note about copytree target

Copying the sample system into a mkdtemp dir and writing atom_pref.npy as (nframes, natoms) is consistent with repeat=3 in the loader. Setting the last atom’s weight to 0 validates masking well. One minor robustness nit: shutil.copytree(..., dirs_exist_ok=True) is fine here, but if src ever contains symlinks or permissions need preserving, consider copy_function/copy2. Not required for this test data.


182-230: Good end-to-end assertions; add tolerances and verify the reported sample size

The comparison path is sound: run dp_test_ener, independently compute the masked MAE/RMSE from dp.eval outputs, and compare. To avoid flaky failures from float32 arithmetic and to verify the new size_f behavior, consider:

  • Use explicit tolerances in assert_allclose.
  • Assert that the returned sizes equal the mask sum.
-        np.testing.assert_allclose(err["mae_f"][0], mae_expected)
-        np.testing.assert_allclose(err["rmse_f"][0], rmse_expected)
+        np.testing.assert_allclose(err["mae_f"][0], mae_expected, rtol=1e-6, atol=1e-8)
+        np.testing.assert_allclose(err["rmse_f"][0], rmse_expected, rtol=1e-6, atol=1e-8)
+        # Also verify the effective sample size matches the mask sum
+        np.testing.assert_equal(err["mae_f"][1], denom)
+        np.testing.assert_equal(err["rmse_f"][1], denom)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between dc99ba2 and 13d43e2.

📒 Files selected for processing (2)
  • deepmd/entrypoints/test.py (4 hunks)
  • source/tests/pt/test_dp_test.py (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
deepmd/entrypoints/test.py (4)
deepmd/utils/data_system.py (1)
  • add (342-395)
deepmd/utils/data.py (1)
  • add (136-189)
deepmd/pt/utils/stat.py (1)
  • rmse (525-526)
deepmd/dpmodel/output_def.py (1)
  • size (230-231)
🪛 Ruff (0.12.2)
source/tests/pt/test_dp_test.py

187-187: Use a context manager for opening files

(SIM115)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: Analyze (python)
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Test C++ (false)
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Test C++ (true)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (4, 3.9)
🔇 Additional comments (6)
deepmd/entrypoints/test.py (3)

294-295: Correctly declares atom-level preference mask aligned with force components

Adding atom_pref as an atomic item with repeat=3 matches the flattened (natoms*3) force layout. Keeping must=False ensures backward compatibility (feature enabled only when the file is present).


317-317: Presence flag retrieval is consistent with existing pattern

Using test_data.get("find_atom_pref") mirrors the existing find_* flags and safely disables weighting when absent.


470-471: Return sizes reflect the effective weighted sample size

Reporting (value, size_f) for mae_f/rmse_f is the right choice for weighted aggregation across systems. No change needed.

source/tests/pt/test_dp_test.py (3)

18-21: Imports for dp_test_ener/DeepEval/DeepmdData are appropriate

These imports are minimal and scoped to the new test’s needs.

Also applies to: 28-30


150-167: Solid setup for masked-force test system

Bootstrapping a tiny training config and pointing both training/validation to the prepared system keeps the test self-contained. No issues spotted.


231-234: Clean teardown

Delegating to the base teardown and removing the temp system directory prevents test artifacts from leaking. Looks good.

@anyangml anyangml requested a review from iProzd August 18, 2025 03:00
Copy link
Collaborator

@anyangml anyangml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@caic99 caic99 requested a review from Copilot August 19, 2025 05:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for masked forces in testing by implementing per-atom weighting for force error metrics. When atom preferences (masks) are provided, the evaluation computes weighted MAE/RMSE that only consider selected atoms, with graceful handling of zero-weight cases.

Key changes:

  • Modified force error calculation to apply atom-level masking when atom_pref data is available
  • Added comprehensive test coverage for masked force evaluation scenarios
  • Updated error metric reporting to reflect the actual number of atoms considered in calculations

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
deepmd/entrypoints/test.py Implements masked force error calculation with weighted MAE/RMSE computation
source/tests/pt/test_dp_test.py Adds end-to-end test validating masked force metrics against expected calculations

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Member

@njzjz njzjz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR assumes the atom_pref is the mask, but it may not as it is not limited to 0 and 1. One may set it to something like 1 1 1 1 5 1 1 1 to let some atom has larger weights.

@caic99 caic99 requested a review from njzjz August 22, 2025 03:04
@njzjz
Copy link
Member

njzjz commented Aug 23, 2025

@caic99 In short, I suggest showing both the error with and without the atom_pref.

Copy link
Collaborator

@wanghan-iapcm wanghan-iapcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atom force prefactor is not designed as a mask, it assigns weights to the atomic force loss terms.

@caic99 caic99 requested a review from anyangml August 26, 2025 06:16
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
source/tests/pt/test_dp_test.py (1)

168-181: Define a named constant for the force offset instead of magic number

This improves readability and mirrors prior feedback.

Apply:

 class TestDPTestForceWeight(DPTest, unittest.TestCase):
     def setUp(self) -> None:
         self.detail_file = "test_dp_test_force_weight_detail"
@@
-    def _prepare_weighted_system(self) -> str:
+    def _prepare_weighted_system(self) -> str:
         src = Path(__file__).parent / "water/data/single"
         tmp_dir = tempfile.mkdtemp()
         shutil.copytree(src, tmp_dir, dirs_exist_ok=True)
         set_dir = Path(tmp_dir) / "set.000"
         forces = np.load(set_dir / "force.npy")
-        forces[0, -3:] += 10.0
+        # bump the last atom only
+        self.FORCE_OFFSET = getattr(self, "FORCE_OFFSET", 10.0)
+        forces[0, -3:] += self.FORCE_OFFSET
         np.save(set_dir / "force.npy", forces)
🧹 Nitpick comments (4)
deepmd/entrypoints/test.py (2)

294-295: Adding atom_pref looks good; consider avoiding 3x memory via compute-time broadcasting

Registering atom_pref as atomic with repeat=3 works, but it inflates this array 3x in memory and I/O. Since the per-atom weight is axis-invariant across x/y/z, you can keep atom_pref as shape [nframes, natoms] and broadcast it at compute time. This reduces memory and simplifies data loading while preserving semantics. See the follow-up diff in the weighting block for a safe broadcast that handles both shapes (with and without repeat), so you can optionally drop repeat=3 here.

Example (paired with the weighting-block change below):

-    data.add("atom_pref", 1, atomic=True, must=False, high_prec=False, repeat=3)
+    # Keep per-atom weights; broadcast to 3 components at compute-time
+    data.add("atom_pref", 1, atomic=True, must=False, high_prec=False)

620-623: Nice: system average now includes weighted force metrics

The conditional print of weighted force MAE/RMSE when present matches reviewer feedback (“show both with and without atom_pref”). Consider also printing the effective sample size (sum of weights) for transparency.

Example:

         if "rmse_fw" in avg:
             log.info(f"Force weighted MAE : {avg['mae_fw']:e} eV/A")
             log.info(f"Force weighted RMSE: {avg['rmse_fw']:e} eV/A")
+            if "mae_fw__size" in avg:
+                log.info(f"Force weighted effN: {avg['mae_fw__size']:g}")

Note: This needs weighted_average to optionally retain sizes in the aggregated output (e.g., by exposing them under a “__size” suffix).

source/tests/pt/test_dp_test.py (2)

217-219: Don’t hardcode mixed_type=False; derive from data to avoid surprises

Future tests may use mixed-type systems; pass the actual flag from DeepmdData.

Apply:

-            mixed_type=False,
+            mixed_type=data.mixed_type,

220-233: Strengthen test: also assert sizes (denominators) used in metrics

Verifies that dp_test_ener reports effective counts consistently for both weighted and unweighted metrics.

Apply:

         mae_unweighted = np.sum(np.abs(diff)) / diff.size
         rmse_unweighted = np.sqrt(np.sum(diff * diff) / diff.size)
         np.testing.assert_allclose(err["mae_f"][0], mae_unweighted)
         np.testing.assert_allclose(err["rmse_f"][0], rmse_unweighted)
+        # Also check the denominators
+        np.testing.assert_equal(err["mae_f"][1], diff.size)
+        np.testing.assert_equal(err["rmse_f"][1], diff.size)
         np.testing.assert_allclose(err["mae_fw"][0], mae_expected)
         np.testing.assert_allclose(err["rmse_fw"][0], rmse_expected)
+        np.testing.assert_allclose(err["mae_fw"][1], denom)
+        np.testing.assert_allclose(err["rmse_fw"][1], denom)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 13d43e2 and 2a4dedc.

📒 Files selected for processing (2)
  • deepmd/entrypoints/test.py (5 hunks)
  • source/tests/pt/test_dp_test.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
source/tests/pt/test_dp_test.py (3)
deepmd/entrypoints/test.py (2)
  • test (58-201)
  • test_ener (257-602)
deepmd/infer/deep_eval.py (4)
  • DeepEval (345-687)
  • get_type_map (156-157)
  • get_type_map (411-413)
  • eval (100-145)
deepmd/utils/data.py (1)
  • DeepmdData (24-739)
deepmd/entrypoints/test.py (2)
deepmd/utils/data_system.py (1)
  • add (342-395)
deepmd/utils/data.py (1)
  • add (136-189)
🪛 Ruff (0.12.2)
source/tests/pt/test_dp_test.py

187-187: Use a context manager for opening files

(SIM115)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (25)
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Analyze (c-cpp)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
source/tests/pt/test_dp_test.py (2)

168-182: Replace magic numbers and make intent explicit in weighted system prep

Use the constants suggested above; avoids magic numbers and clarifies which atoms are emphasized or ignored. Also fine to keep dirs_exist_ok=True since mkdtemp() pre-creates the directory.

         forces = np.load(set_dir / "force.npy")
-        forces[0, :3] += 1.0
-        forces[0, -3:] += 10.0
+        forces[0, :3] += self.FORCE_OFFSET_FIRST
+        forces[0, -3:] += self.FORCE_OFFSET_LAST
         np.save(set_dir / "force.npy", forces)
         natoms = forces.shape[1] // 3
         atom_pref = np.ones((forces.shape[0], natoms), dtype=forces.dtype)
-        atom_pref[:, 0] = 2.0
-        atom_pref[:, -1] = 0.0
+        atom_pref[:, 0] = self.FIRST_ATOM_WEIGHT
+        atom_pref[:, -1] = self.LAST_ATOM_WEIGHT
         np.save(set_dir / "atom_pref.npy", atom_pref)

207-214: Slice coord to 1 frame before reshape; current code flattens all frames

Reshape without slicing can consume all frames. Slice [:1] to match box and numb_test=1.

-        coord = test_data["coord"].reshape([1, -1])
+        coord = test_data["coord"][:1].reshape([1, -1])
🧹 Nitpick comments (3)
source/tests/pt/test_dp_test.py (3)

150-167: Good test scaffolding; consider defining explicit constants for readability

The setup is clear and mirrors existing patterns. To improve clarity and future maintenance, define the force offsets and atom weights as class-level constants and reuse them in _prepare_weighted_system. This also addresses prior nitpicks about magic numbers.

 class TestDPTestForceWeight(DPTest, unittest.TestCase):
+    # Test constants for reproducibility and readability
+    FORCE_OFFSET_FIRST = 1.0
+    FORCE_OFFSET_LAST = 10.0
+    FIRST_ATOM_WEIGHT = 2.0
+    LAST_ATOM_WEIGHT = 0.0

189-191: Ensure temp model file is always closed and cleaned up, even on failures

Close the NamedTemporaryFile (SIM115) and move cleanup to tearDown so files aren’t leaked on assertion failures. Store the path on self.

-        tmp_model = tempfile.NamedTemporaryFile(delete=False, suffix=".pth")
-        torch.jit.save(model, tmp_model.name)
+        tmp_model = tempfile.NamedTemporaryFile(delete=False, suffix=".pth")
+        torch.jit.save(model, tmp_model.name)
+        tmp_model.close()  # ensure the handle is closed before other consumers use the path
+        self._tmp_model_path = tmp_model.name
@@
-        os.unlink(tmp_model.name)
+        # cleanup moved to tearDown
@@
     def tearDown(self) -> None:
         super().tearDown()
         shutil.rmtree(self.system_dir)
+        # Ensure model file is removed even if the test fails early
+        if hasattr(self, "_tmp_model_path") and os.path.exists(self._tmp_model_path):
+            os.unlink(self._tmp_model_path)

Also applies to: 237-237, 239-241


226-236: Strengthen assertions; verify denominators and guard against zero sum

Validate the effective sample sizes returned by dp_test_ener and add a guard to avoid division by zero if all weights are zero (future-proofing). This also aligns with the reviewer suggestion to “show both the error with and without atom_pref” by making both paths explicit in assertions.

         masked_diff = diff[mask]
-        mae_unweighted = np.sum(np.abs(masked_diff)) / mask.sum()
-        rmse_unweighted = np.sqrt(np.sum(masked_diff * masked_diff) / mask.sum())
+        mae_unweighted = np.sum(np.abs(masked_diff)) / mask.sum()
+        rmse_unweighted = np.sqrt(np.sum(masked_diff * masked_diff) / mask.sum())
         denom = weight.sum()
+        assert denom > 0, "Weight sum is zero; cannot compute weighted errors."
         mae_weighted = np.sum(np.abs(diff) * weight) / denom
         rmse_weighted = np.sqrt(np.sum(diff * diff * weight) / denom)
-        np.testing.assert_allclose(err["mae_f"][0], mae_unweighted)
-        np.testing.assert_allclose(err["rmse_f"][0], rmse_unweighted)
-        np.testing.assert_allclose(err["mae_fw"][0], mae_weighted)
-        np.testing.assert_allclose(err["rmse_fw"][0], rmse_weighted)
+        # values and denominators (effective sample sizes)
+        np.testing.assert_equal(err["mae_f"][1], mask.sum())
+        np.testing.assert_equal(err["rmse_f"][1], mask.sum())
+        np.testing.assert_allclose(err["mae_f"][0], mae_unweighted)
+        np.testing.assert_allclose(err["rmse_f"][0], rmse_unweighted)
+        np.testing.assert_equal(err["mae_fw"][1], weight.sum())
+        np.testing.assert_equal(err["rmse_fw"][1], weight.sum())
+        np.testing.assert_allclose(err["mae_fw"][0], mae_weighted)
+        np.testing.assert_allclose(err["rmse_fw"][0], rmse_weighted)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2a4dedc and e74b4bd.

📒 Files selected for processing (2)
  • deepmd/entrypoints/test.py (5 hunks)
  • source/tests/pt/test_dp_test.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • deepmd/entrypoints/test.py
🧰 Additional context used
🧬 Code graph analysis (1)
source/tests/pt/test_dp_test.py (3)
deepmd/entrypoints/test.py (2)
  • test (58-201)
  • test_ener (257-612)
deepmd/infer/deep_eval.py (4)
  • DeepEval (345-687)
  • get_type_map (156-157)
  • get_type_map (411-413)
  • eval (100-145)
deepmd/utils/data.py (1)
  • DeepmdData (24-739)
🪛 Ruff (0.12.2)
source/tests/pt/test_dp_test.py

189-189: Use a context manager for opening files

(SIM115)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Build C++ (rocm, rocm)
🔇 Additional comments (2)
source/tests/pt/test_dp_test.py (2)

18-21: LGTM: New imports are correct and used

dp_test_ener and DeepEval are both used below. No issues.


28-30: LGTM: DeepmdData import

The import is necessary for the explicit data loading in the new test.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
source/tests/pt/test_dp_test.py (2)

168-183: Replace magic numbers with named constants for clarity

Use named constants for the force offsets and atom weights to improve readability and intent.

Apply within this block:

-        forces[0, :3] += 1.0
-        forces[0, -3:] += 10.0
+        forces[0, :3] += self.FORCE_OFFSET_HEAD
+        forces[0, -3:] += self.FORCE_OFFSET_TAIL
-        atom_pref[:, 0] = 2.0
-        atom_pref[:, -1] = 0.0
+        atom_pref[:, 0] = self.HEAD_WEIGHT
+        atom_pref[:, -1] = self.TAIL_WEIGHT

Add these class attributes near the top of the class:

# class-level constants (add near class definition)
FORCE_OFFSET_HEAD = 1.0
FORCE_OFFSET_TAIL = 10.0
HEAD_WEIGHT = 2.0
TAIL_WEIGHT = 0.0

207-222: Fix two bugs: frame slicing and dict access for eval output

  • coord must be sliced to the first frame before reshape; current code flattens all frames.
  • DeepEval.eval returns a dict; ret[1] will KeyError. Use ret["force"].
-        coord = test_data["coord"].reshape([1, -1])
+        coord = test_data["coord"][:1].reshape([1, -1])
@@
-        ret = dp.eval(
+        ret = dp.eval(
             coord,
             box,
             atype,
             fparam=None,
             aparam=None,
             atomic=False,
             efield=None,
             mixed_type=False,
             spin=None,
         )
-        force_pred = ret[1].reshape([1, -1])
+        force_pred = ret["force"].reshape([1, -1])
🧹 Nitpick comments (3)
source/tests/pt/test_dp_test.py (3)

189-191: Use a context manager for the temp model file (ruff SIM115) and reuse the path

Prevents resource leaks and aligns with tooling suggestions.

-        tmp_model = tempfile.NamedTemporaryFile(delete=False, suffix=".pth")
-        torch.jit.save(model, tmp_model.name)
-        dp = DeepEval(tmp_model.name)
+        with tempfile.NamedTemporaryFile(delete=False, suffix=".pth") as tmp_model:
+            torch.jit.save(model, tmp_model.name)
+            tmp_model_path = tmp_model.name
+        dp = DeepEval(tmp_model_path)
@@
-        os.unlink(tmp_model.name)
+        os.unlink(tmp_model_path)

Also applies to: 235-235


226-234: Good: validates both unweighted and weighted MAE/RMSE (addresses reviewer ask)

Asserting both metrics matches the behavior with and without atom_pref.

Optionally add tolerances to avoid flakiness from dtype/ordering differences:

-        np.testing.assert_allclose(err["mae_f"][0], mae_unweighted)
-        np.testing.assert_allclose(err["rmse_f"][0], rmse_unweighted)
-        np.testing.assert_allclose(err["mae_fw"][0], mae_weighted)
-        np.testing.assert_allclose(err["rmse_fw"][0], rmse_weighted)
+        np.testing.assert_allclose(err["mae_f"][0],  mae_unweighted,  rtol=1e-6, atol=1e-8)
+        np.testing.assert_allclose(err["rmse_f"][0], rmse_unweighted, rtol=1e-6, atol=1e-8)
+        np.testing.assert_allclose(err["mae_fw"][0], mae_weighted,    rtol=1e-6, atol=1e-8)
+        np.testing.assert_allclose(err["rmse_fw"][0], rmse_weighted,  rtol=1e-6, atol=1e-8)

237-240: Make teardown more robust if setup fails mid-way

Guard against missing directory to avoid errors on teardown in partial failures.

-        shutil.rmtree(self.system_dir)
+        if hasattr(self, "system_dir") and os.path.isdir(self.system_dir):
+            shutil.rmtree(self.system_dir, ignore_errors=True)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e74b4bd and 7e1f36d.

📒 Files selected for processing (2)
  • deepmd/entrypoints/test.py (5 hunks)
  • source/tests/pt/test_dp_test.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • deepmd/entrypoints/test.py
🧰 Additional context used
🧬 Code graph analysis (1)
source/tests/pt/test_dp_test.py (3)
deepmd/entrypoints/test.py (2)
  • test (58-201)
  • test_ener (257-602)
deepmd/infer/deep_eval.py (4)
  • DeepEval (345-687)
  • get_type_map (156-157)
  • get_type_map (411-413)
  • eval (100-145)
deepmd/utils/data.py (1)
  • DeepmdData (24-739)
🪛 Ruff (0.12.2)
source/tests/pt/test_dp_test.py

189-189: Use a context manager for opening files

(SIM115)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (c-cpp)
🔇 Additional comments (4)
source/tests/pt/test_dp_test.py (4)

18-21: LGTM: necessary imports for weighted-force testing

Importing test_ener and DeepEval here is appropriate for the new weighted-force test flow.


28-30: LGTM: DeepmdData import is correct

Needed for constructing the test dataset with atom_pref.


150-167: Good test fixture setup

Solid setup: isolates a temporary weighted system, forces single-step training, and writes a dedicated input JSON. No issues.


191-191: DeepEval constructor signature confirmed

Verified in deepmd/infer/deep_eval.py that the DeepEval.__init__ method is defined as:

def __init__(self, model_file: str):
    …

and does not require an output_def argument . The instantiation

dp = DeepEval(tmp_model.name)

matches the current signature, so no changes are needed.

@caic99 caic99 requested a review from njzjz August 27, 2025 09:49
@caic99 caic99 requested a review from wanghan-iapcm August 27, 2025 10:33
@iProzd iProzd added this pull request to the merge queue Aug 29, 2025
Merged via the queue into deepmodeling:devel with commit aa44098 Aug 29, 2025
60 checks passed
@iProzd iProzd deleted the codex/update-test_ener-to-handle-masked-forces branch August 29, 2025 10:11
iProzd pushed a commit to iProzd/deepmd-kit that referenced this pull request Nov 19, 2025
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- New Features
- Added per-atom weighting for force evaluation: computes and reports
weighted MAE/RMSE alongside unweighted metrics, includes weighted
metrics in system-average summaries, logs weighted force metrics, and
safely handles zero-weight cases. Also propagates the per-atom weight
field into reporting.

- Tests
- Added end-to-end tests validating weighted vs unweighted force
MAE/RMSE and verifying evaluator outputs when using per-atom weight
masks.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
ChiahsinChu pushed a commit to ChiahsinChu/deepmd-kit that referenced this pull request Dec 17, 2025
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- New Features
- Added per-atom weighting for force evaluation: computes and reports
weighted MAE/RMSE alongside unweighted metrics, includes weighted
metrics in system-average summaries, logs weighted force metrics, and
safely handles zero-weight cases. Also propagates the per-atom weight
field into reporting.

- Tests
- Added end-to-end tests validating weighted vs unweighted force
MAE/RMSE and verifying evaluator outputs when using per-atom weight
masks.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants