-
Notifications
You must be signed in to change notification settings - Fork 584
feat: handle masked forces in test #4893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: handle masked forces in test #4893
Conversation
for more information, see https://pre-commit.ci
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## devel #4893 +/- ##
=======================================
Coverage 84.29% 84.30%
=======================================
Files 702 703 +1
Lines 68664 68746 +82
Branches 3572 3573 +1
=======================================
+ Hits 57883 57953 +70
- Misses 9641 9653 +12
Partials 1140 1140 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughAdds per-atom weighting support for force error metrics in Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Entrypoint as dp_test_ener
participant TestData as test_data
participant Metrics as MetricCalc
Note over Entrypoint,TestData: Read forces and optional atom_pref
Entrypoint->>TestData: get forces, n_frames, find_atom_pref, atom_pref
Entrypoint->>Metrics: compute diff_f = pred_f - test_f
alt find_atom_pref == 1
Metrics->>Metrics: apply per-atom weights (atom_pref)
Metrics-->>Entrypoint: mae_fw, rmse_fw, weight_sum
Metrics-->>Entrypoint: mae_f, rmse_f, size_f
else
Metrics-->>Entrypoint: mae_f, rmse_f, size_f (unweighted)
end
Entrypoint-->>Entrypoint: populate result dict (include weighted fields if present)
sequenceDiagram
autonumber
participant TestCase as TestDPTestForceWeight
participant Prep as _prepare_weighted_system
participant Trainer as Trainer/Script
participant Eval as DeepEval / dp_test_ener
participant Loader as DeepmdData
TestCase->>Prep: copy dataset, modify forces, add `atom_pref` mask
TestCase->>Trainer: build/train & script model -> save .pth
TestCase->>Loader: load prepared test dataset
TestCase->>Eval: run scripted model and dp_test_ener
Eval-->>TestCase: returns mae_f, rmse_f, mae_fw, rmse_fw (and sizes)
TestCase->>TestCase: compute independent weighted/unweighted metrics
TestCase-->>TestCase: assert reported == computed metrics
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
deepmd/entrypoints/test.py (1)
421-435: Weighted force MAE/RMSE look correct; add a shape guard for robustnessThe weighting logic (sum of abs residuals / sum of weights and sqrt(weighted MSE)) is sound, and size_f=sum(weights) makes the system-level aggregation consistent. To tolerate atom_pref provided as (nframes, natoms) without depending on repeat=3 at load time, add a defensive broadcast/validation.
- if find_atom_pref == 1: - atom_pref = test_data["atom_pref"][:numb_test] - diff_f = diff_f * atom_pref + if find_atom_pref == 1: + atom_pref = test_data["atom_pref"][:numb_test] + # Accept both (nframes, natoms) and (nframes, natoms*3) + if atom_pref.shape != diff_f.shape: + if atom_pref.shape[1] * 3 == diff_f.shape[1]: + atom_pref = np.repeat(atom_pref, 3, axis=1) + else: + raise ValueError( + f"atom_pref shape {atom_pref.shape} is incompatible with force shape {diff_f.shape}" + ) + diff_f = diff_f * atom_pref size_f = np.sum(atom_pref) if size_f > 0: mae_f = np.sum(np.abs(diff_f)) / size_f rmse_f = np.sqrt(np.sum(diff_f * diff_f) / size_f) else: mae_f = 0.0 rmse_f = 0.0If desired, we can also factor this into small helpers (weighted_mae/weighted_rmse) to keep this block concise.
source/tests/pt/test_dp_test.py (2)
168-181: Mask preparation is correct; tiny robustness note about copytree targetCopying the sample system into a mkdtemp dir and writing atom_pref.npy as (nframes, natoms) is consistent with repeat=3 in the loader. Setting the last atom’s weight to 0 validates masking well. One minor robustness nit: shutil.copytree(..., dirs_exist_ok=True) is fine here, but if src ever contains symlinks or permissions need preserving, consider copy_function/copy2. Not required for this test data.
182-230: Good end-to-end assertions; add tolerances and verify the reported sample sizeThe comparison path is sound: run dp_test_ener, independently compute the masked MAE/RMSE from dp.eval outputs, and compare. To avoid flaky failures from float32 arithmetic and to verify the new size_f behavior, consider:
- Use explicit tolerances in assert_allclose.
- Assert that the returned sizes equal the mask sum.
- np.testing.assert_allclose(err["mae_f"][0], mae_expected) - np.testing.assert_allclose(err["rmse_f"][0], rmse_expected) + np.testing.assert_allclose(err["mae_f"][0], mae_expected, rtol=1e-6, atol=1e-8) + np.testing.assert_allclose(err["rmse_f"][0], rmse_expected, rtol=1e-6, atol=1e-8) + # Also verify the effective sample size matches the mask sum + np.testing.assert_equal(err["mae_f"][1], denom) + np.testing.assert_equal(err["rmse_f"][1], denom)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
deepmd/entrypoints/test.py(4 hunks)source/tests/pt/test_dp_test.py(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
deepmd/entrypoints/test.py (4)
deepmd/utils/data_system.py (1)
add(342-395)deepmd/utils/data.py (1)
add(136-189)deepmd/pt/utils/stat.py (1)
rmse(525-526)deepmd/dpmodel/output_def.py (1)
size(230-231)
🪛 Ruff (0.12.2)
source/tests/pt/test_dp_test.py
187-187: Use a context manager for opening files
(SIM115)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
- GitHub Check: Analyze (python)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Test C++ (false)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Test C++ (true)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Test Python (4, 3.9)
🔇 Additional comments (6)
deepmd/entrypoints/test.py (3)
294-295: Correctly declares atom-level preference mask aligned with force componentsAdding atom_pref as an atomic item with repeat=3 matches the flattened (natoms*3) force layout. Keeping must=False ensures backward compatibility (feature enabled only when the file is present).
317-317: Presence flag retrieval is consistent with existing patternUsing test_data.get("find_atom_pref") mirrors the existing find_* flags and safely disables weighting when absent.
470-471: Return sizes reflect the effective weighted sample sizeReporting (value, size_f) for mae_f/rmse_f is the right choice for weighted aggregation across systems. No change needed.
source/tests/pt/test_dp_test.py (3)
18-21: Imports for dp_test_ener/DeepEval/DeepmdData are appropriateThese imports are minimal and scoped to the new test’s needs.
Also applies to: 28-30
150-167: Solid setup for masked-force test systemBootstrapping a tiny training config and pointing both training/validation to the prepared system keeps the test self-contained. No issues spotted.
231-234: Clean teardownDelegating to the base teardown and removing the temp system directory prevents test artifacts from leaking. Looks good.
anyangml
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for masked forces in testing by implementing per-atom weighting for force error metrics. When atom preferences (masks) are provided, the evaluation computes weighted MAE/RMSE that only consider selected atoms, with graceful handling of zero-weight cases.
Key changes:
- Modified force error calculation to apply atom-level masking when
atom_prefdata is available - Added comprehensive test coverage for masked force evaluation scenarios
- Updated error metric reporting to reflect the actual number of atoms considered in calculations
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| deepmd/entrypoints/test.py | Implements masked force error calculation with weighted MAE/RMSE computation |
| source/tests/pt/test_dp_test.py | Adds end-to-end test validating masked force metrics against expected calculations |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR assumes the atom_pref is the mask, but it may not as it is not limited to 0 and 1. One may set it to something like 1 1 1 1 5 1 1 1 to let some atom has larger weights.
|
@caic99 In short, I suggest showing both the error with and without the |
wanghan-iapcm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atom force prefactor is not designed as a mask, it assigns weights to the atomic force loss terms.
…into codex/update-test_ener-to-handle-masked-forces
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
source/tests/pt/test_dp_test.py (1)
168-181: Define a named constant for the force offset instead of magic numberThis improves readability and mirrors prior feedback.
Apply:
class TestDPTestForceWeight(DPTest, unittest.TestCase): def setUp(self) -> None: self.detail_file = "test_dp_test_force_weight_detail" @@ - def _prepare_weighted_system(self) -> str: + def _prepare_weighted_system(self) -> str: src = Path(__file__).parent / "water/data/single" tmp_dir = tempfile.mkdtemp() shutil.copytree(src, tmp_dir, dirs_exist_ok=True) set_dir = Path(tmp_dir) / "set.000" forces = np.load(set_dir / "force.npy") - forces[0, -3:] += 10.0 + # bump the last atom only + self.FORCE_OFFSET = getattr(self, "FORCE_OFFSET", 10.0) + forces[0, -3:] += self.FORCE_OFFSET np.save(set_dir / "force.npy", forces)
🧹 Nitpick comments (4)
deepmd/entrypoints/test.py (2)
294-295: Adding atom_pref looks good; consider avoiding 3x memory via compute-time broadcastingRegistering atom_pref as atomic with repeat=3 works, but it inflates this array 3x in memory and I/O. Since the per-atom weight is axis-invariant across x/y/z, you can keep atom_pref as shape [nframes, natoms] and broadcast it at compute time. This reduces memory and simplifies data loading while preserving semantics. See the follow-up diff in the weighting block for a safe broadcast that handles both shapes (with and without repeat), so you can optionally drop repeat=3 here.
Example (paired with the weighting-block change below):
- data.add("atom_pref", 1, atomic=True, must=False, high_prec=False, repeat=3) + # Keep per-atom weights; broadcast to 3 components at compute-time + data.add("atom_pref", 1, atomic=True, must=False, high_prec=False)
620-623: Nice: system average now includes weighted force metricsThe conditional print of weighted force MAE/RMSE when present matches reviewer feedback (“show both with and without atom_pref”). Consider also printing the effective sample size (sum of weights) for transparency.
Example:
if "rmse_fw" in avg: log.info(f"Force weighted MAE : {avg['mae_fw']:e} eV/A") log.info(f"Force weighted RMSE: {avg['rmse_fw']:e} eV/A") + if "mae_fw__size" in avg: + log.info(f"Force weighted effN: {avg['mae_fw__size']:g}")Note: This needs
weighted_averageto optionally retain sizes in the aggregated output (e.g., by exposing them under a “__size” suffix).source/tests/pt/test_dp_test.py (2)
217-219: Don’t hardcode mixed_type=False; derive from data to avoid surprisesFuture tests may use mixed-type systems; pass the actual flag from DeepmdData.
Apply:
- mixed_type=False, + mixed_type=data.mixed_type,
220-233: Strengthen test: also assert sizes (denominators) used in metricsVerifies that dp_test_ener reports effective counts consistently for both weighted and unweighted metrics.
Apply:
mae_unweighted = np.sum(np.abs(diff)) / diff.size rmse_unweighted = np.sqrt(np.sum(diff * diff) / diff.size) np.testing.assert_allclose(err["mae_f"][0], mae_unweighted) np.testing.assert_allclose(err["rmse_f"][0], rmse_unweighted) + # Also check the denominators + np.testing.assert_equal(err["mae_f"][1], diff.size) + np.testing.assert_equal(err["rmse_f"][1], diff.size) np.testing.assert_allclose(err["mae_fw"][0], mae_expected) np.testing.assert_allclose(err["rmse_fw"][0], rmse_expected) + np.testing.assert_allclose(err["mae_fw"][1], denom) + np.testing.assert_allclose(err["rmse_fw"][1], denom)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
deepmd/entrypoints/test.py(5 hunks)source/tests/pt/test_dp_test.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
source/tests/pt/test_dp_test.py (3)
deepmd/entrypoints/test.py (2)
test(58-201)test_ener(257-602)deepmd/infer/deep_eval.py (4)
DeepEval(345-687)get_type_map(156-157)get_type_map(411-413)eval(100-145)deepmd/utils/data.py (1)
DeepmdData(24-739)
deepmd/entrypoints/test.py (2)
deepmd/utils/data_system.py (1)
add(342-395)deepmd/utils/data.py (1)
add(136-189)
🪛 Ruff (0.12.2)
source/tests/pt/test_dp_test.py
187-187: Use a context manager for opening files
(SIM115)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (25)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Test C++ (false)
- GitHub Check: Test C++ (true)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Analyze (c-cpp)
…into codex/update-test_ener-to-handle-masked-forces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
source/tests/pt/test_dp_test.py (2)
168-182: Replace magic numbers and make intent explicit in weighted system prepUse the constants suggested above; avoids magic numbers and clarifies which atoms are emphasized or ignored. Also fine to keep
dirs_exist_ok=Truesincemkdtemp()pre-creates the directory.forces = np.load(set_dir / "force.npy") - forces[0, :3] += 1.0 - forces[0, -3:] += 10.0 + forces[0, :3] += self.FORCE_OFFSET_FIRST + forces[0, -3:] += self.FORCE_OFFSET_LAST np.save(set_dir / "force.npy", forces) natoms = forces.shape[1] // 3 atom_pref = np.ones((forces.shape[0], natoms), dtype=forces.dtype) - atom_pref[:, 0] = 2.0 - atom_pref[:, -1] = 0.0 + atom_pref[:, 0] = self.FIRST_ATOM_WEIGHT + atom_pref[:, -1] = self.LAST_ATOM_WEIGHT np.save(set_dir / "atom_pref.npy", atom_pref)
207-214: Slice coord to 1 frame before reshape; current code flattens all framesReshape without slicing can consume all frames. Slice
[:1]to matchboxandnumb_test=1.- coord = test_data["coord"].reshape([1, -1]) + coord = test_data["coord"][:1].reshape([1, -1])
🧹 Nitpick comments (3)
source/tests/pt/test_dp_test.py (3)
150-167: Good test scaffolding; consider defining explicit constants for readabilityThe setup is clear and mirrors existing patterns. To improve clarity and future maintenance, define the force offsets and atom weights as class-level constants and reuse them in
_prepare_weighted_system. This also addresses prior nitpicks about magic numbers.class TestDPTestForceWeight(DPTest, unittest.TestCase): + # Test constants for reproducibility and readability + FORCE_OFFSET_FIRST = 1.0 + FORCE_OFFSET_LAST = 10.0 + FIRST_ATOM_WEIGHT = 2.0 + LAST_ATOM_WEIGHT = 0.0
189-191: Ensure temp model file is always closed and cleaned up, even on failuresClose the
NamedTemporaryFile(SIM115) and move cleanup totearDownso files aren’t leaked on assertion failures. Store the path onself.- tmp_model = tempfile.NamedTemporaryFile(delete=False, suffix=".pth") - torch.jit.save(model, tmp_model.name) + tmp_model = tempfile.NamedTemporaryFile(delete=False, suffix=".pth") + torch.jit.save(model, tmp_model.name) + tmp_model.close() # ensure the handle is closed before other consumers use the path + self._tmp_model_path = tmp_model.name @@ - os.unlink(tmp_model.name) + # cleanup moved to tearDown @@ def tearDown(self) -> None: super().tearDown() shutil.rmtree(self.system_dir) + # Ensure model file is removed even if the test fails early + if hasattr(self, "_tmp_model_path") and os.path.exists(self._tmp_model_path): + os.unlink(self._tmp_model_path)Also applies to: 237-237, 239-241
226-236: Strengthen assertions; verify denominators and guard against zero sumValidate the effective sample sizes returned by
dp_test_enerand add a guard to avoid division by zero if all weights are zero (future-proofing). This also aligns with the reviewer suggestion to “show both the error with and without atom_pref” by making both paths explicit in assertions.masked_diff = diff[mask] - mae_unweighted = np.sum(np.abs(masked_diff)) / mask.sum() - rmse_unweighted = np.sqrt(np.sum(masked_diff * masked_diff) / mask.sum()) + mae_unweighted = np.sum(np.abs(masked_diff)) / mask.sum() + rmse_unweighted = np.sqrt(np.sum(masked_diff * masked_diff) / mask.sum()) denom = weight.sum() + assert denom > 0, "Weight sum is zero; cannot compute weighted errors." mae_weighted = np.sum(np.abs(diff) * weight) / denom rmse_weighted = np.sqrt(np.sum(diff * diff * weight) / denom) - np.testing.assert_allclose(err["mae_f"][0], mae_unweighted) - np.testing.assert_allclose(err["rmse_f"][0], rmse_unweighted) - np.testing.assert_allclose(err["mae_fw"][0], mae_weighted) - np.testing.assert_allclose(err["rmse_fw"][0], rmse_weighted) + # values and denominators (effective sample sizes) + np.testing.assert_equal(err["mae_f"][1], mask.sum()) + np.testing.assert_equal(err["rmse_f"][1], mask.sum()) + np.testing.assert_allclose(err["mae_f"][0], mae_unweighted) + np.testing.assert_allclose(err["rmse_f"][0], rmse_unweighted) + np.testing.assert_equal(err["mae_fw"][1], weight.sum()) + np.testing.assert_equal(err["rmse_fw"][1], weight.sum()) + np.testing.assert_allclose(err["mae_fw"][0], mae_weighted) + np.testing.assert_allclose(err["rmse_fw"][0], rmse_weighted)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
deepmd/entrypoints/test.py(5 hunks)source/tests/pt/test_dp_test.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/entrypoints/test.py
🧰 Additional context used
🧬 Code graph analysis (1)
source/tests/pt/test_dp_test.py (3)
deepmd/entrypoints/test.py (2)
test(58-201)test_ener(257-612)deepmd/infer/deep_eval.py (4)
DeepEval(345-687)get_type_map(156-157)get_type_map(411-413)eval(100-145)deepmd/utils/data.py (1)
DeepmdData(24-739)
🪛 Ruff (0.12.2)
source/tests/pt/test_dp_test.py
189-189: Use a context manager for opening files
(SIM115)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Test C++ (true)
- GitHub Check: Test C++ (false)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Build C++ (rocm, rocm)
🔇 Additional comments (2)
source/tests/pt/test_dp_test.py (2)
18-21: LGTM: New imports are correct and used
dp_test_enerandDeepEvalare both used below. No issues.
28-30: LGTM: DeepmdData importThe import is necessary for the explicit data loading in the new test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
source/tests/pt/test_dp_test.py (2)
168-183: Replace magic numbers with named constants for clarityUse named constants for the force offsets and atom weights to improve readability and intent.
Apply within this block:
- forces[0, :3] += 1.0 - forces[0, -3:] += 10.0 + forces[0, :3] += self.FORCE_OFFSET_HEAD + forces[0, -3:] += self.FORCE_OFFSET_TAIL - atom_pref[:, 0] = 2.0 - atom_pref[:, -1] = 0.0 + atom_pref[:, 0] = self.HEAD_WEIGHT + atom_pref[:, -1] = self.TAIL_WEIGHTAdd these class attributes near the top of the class:
# class-level constants (add near class definition) FORCE_OFFSET_HEAD = 1.0 FORCE_OFFSET_TAIL = 10.0 HEAD_WEIGHT = 2.0 TAIL_WEIGHT = 0.0
207-222: Fix two bugs: frame slicing and dict access for eval output
- coord must be sliced to the first frame before reshape; current code flattens all frames.
DeepEval.evalreturns a dict;ret[1]will KeyError. Useret["force"].- coord = test_data["coord"].reshape([1, -1]) + coord = test_data["coord"][:1].reshape([1, -1]) @@ - ret = dp.eval( + ret = dp.eval( coord, box, atype, fparam=None, aparam=None, atomic=False, efield=None, mixed_type=False, spin=None, ) - force_pred = ret[1].reshape([1, -1]) + force_pred = ret["force"].reshape([1, -1])
🧹 Nitpick comments (3)
source/tests/pt/test_dp_test.py (3)
189-191: Use a context manager for the temp model file (ruff SIM115) and reuse the pathPrevents resource leaks and aligns with tooling suggestions.
- tmp_model = tempfile.NamedTemporaryFile(delete=False, suffix=".pth") - torch.jit.save(model, tmp_model.name) - dp = DeepEval(tmp_model.name) + with tempfile.NamedTemporaryFile(delete=False, suffix=".pth") as tmp_model: + torch.jit.save(model, tmp_model.name) + tmp_model_path = tmp_model.name + dp = DeepEval(tmp_model_path) @@ - os.unlink(tmp_model.name) + os.unlink(tmp_model_path)Also applies to: 235-235
226-234: Good: validates both unweighted and weighted MAE/RMSE (addresses reviewer ask)Asserting both metrics matches the behavior with and without
atom_pref.Optionally add tolerances to avoid flakiness from dtype/ordering differences:
- np.testing.assert_allclose(err["mae_f"][0], mae_unweighted) - np.testing.assert_allclose(err["rmse_f"][0], rmse_unweighted) - np.testing.assert_allclose(err["mae_fw"][0], mae_weighted) - np.testing.assert_allclose(err["rmse_fw"][0], rmse_weighted) + np.testing.assert_allclose(err["mae_f"][0], mae_unweighted, rtol=1e-6, atol=1e-8) + np.testing.assert_allclose(err["rmse_f"][0], rmse_unweighted, rtol=1e-6, atol=1e-8) + np.testing.assert_allclose(err["mae_fw"][0], mae_weighted, rtol=1e-6, atol=1e-8) + np.testing.assert_allclose(err["rmse_fw"][0], rmse_weighted, rtol=1e-6, atol=1e-8)
237-240: Make teardown more robust if setup fails mid-wayGuard against missing directory to avoid errors on teardown in partial failures.
- shutil.rmtree(self.system_dir) + if hasattr(self, "system_dir") and os.path.isdir(self.system_dir): + shutil.rmtree(self.system_dir, ignore_errors=True)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
deepmd/entrypoints/test.py(5 hunks)source/tests/pt/test_dp_test.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/entrypoints/test.py
🧰 Additional context used
🧬 Code graph analysis (1)
source/tests/pt/test_dp_test.py (3)
deepmd/entrypoints/test.py (2)
test(58-201)test_ener(257-602)deepmd/infer/deep_eval.py (4)
DeepEval(345-687)get_type_map(156-157)get_type_map(411-413)eval(100-145)deepmd/utils/data.py (1)
DeepmdData(24-739)
🪛 Ruff (0.12.2)
source/tests/pt/test_dp_test.py
189-189: Use a context manager for opening files
(SIM115)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (c-cpp)
🔇 Additional comments (4)
source/tests/pt/test_dp_test.py (4)
18-21: LGTM: necessary imports for weighted-force testingImporting
test_enerandDeepEvalhere is appropriate for the new weighted-force test flow.
28-30: LGTM:DeepmdDataimport is correctNeeded for constructing the test dataset with
atom_pref.
150-167: Good test fixture setupSolid setup: isolates a temporary weighted system, forces single-step training, and writes a dedicated input JSON. No issues.
191-191: DeepEval constructor signature confirmedVerified in
deepmd/infer/deep_eval.pythat theDeepEval.__init__method is defined as:def __init__(self, model_file: str): …and does not require an
output_defargument . The instantiationdp = DeepEval(tmp_model.name)matches the current signature, so no changes are needed.
<!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added per-atom weighting for force evaluation: computes and reports weighted MAE/RMSE alongside unweighted metrics, includes weighted metrics in system-average summaries, logs weighted force metrics, and safely handles zero-weight cases. Also propagates the per-atom weight field into reporting. - Tests - Added end-to-end tests validating weighted vs unweighted force MAE/RMSE and verifying evaluator outputs when using per-atom weight masks. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
<!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added per-atom weighting for force evaluation: computes and reports weighted MAE/RMSE alongside unweighted metrics, includes weighted metrics in system-average summaries, logs weighted force metrics, and safely handles zero-weight cases. Also propagates the per-atom weight field into reporting. - Tests - Added end-to-end tests validating weighted vs unweighted force MAE/RMSE and verifying evaluator outputs when using per-atom weight masks. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Summary by CodeRabbit
New Features
Tests