-
Notifications
You must be signed in to change notification settings - Fork 584
fix(dpmodel): fix normalize scale of initial parameters #4774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(dpmodel): fix normalize scale of initial parameters #4774
Conversation
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn> (cherry picked from commit e88838b)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adjusts the initialization scale of network parameters to match PyTorch’s default normalization by dividing by √(num_in + num_out).
- Applies a scale factor of 1/√(num_in + num_out) to weight, bias, and idt initializations
- Ensures bias and timestep (idt) follow the same scaled normal distribution
Comments suppressed due to low confidence (4)
deepmd/dpmodel/utils/network.py:118
- [nitpick] The name
idtis not immediately clear—consider renaming totimestep_biasor adding a comment explaining its purpose.
self.idt = (
deepmd/dpmodel/utils/network.py:107
- Add a comment explaining the choice of 1/√(num_in + num_out) scaling and how it aligns with PyTorch’s initialization to aid future maintainers.
rng = np.random.default_rng(seed)
deepmd/dpmodel/utils/network.py:108
- Add a unit test to verify that the standard deviation of the initialized weights matches the intended scale (1/√(num_in + num_out)).
self.w = rng.normal(
deepmd/dpmodel/utils/network.py:111
- Consider initializing biases to zero instead of a random normal; many frameworks default biases to zero for faster convergence.
self.b = (
📝 WalkthroughWalkthroughThe initialization logic for weights, biases, and identity timestep vectors in the Changes
Suggested labels
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (29)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
deepmd/dpmodel/utils/network.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (29)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Analyze (python)
- GitHub Check: Test C++ (false)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Test C++ (true)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Analyze (c-cpp)
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Jinzhe Zeng <njzjz@qq.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## devel #4774 +/- ##
=======================================
Coverage 84.79% 84.79%
=======================================
Files 698 698
Lines 67746 67747 +1
Branches 3540 3540
=======================================
+ Hits 57444 57446 +2
Misses 9171 9171
+ Partials 1131 1130 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The current scale is too large. This PR makes it consistent with PT.
Summary by CodeRabbit