-
Notifications
You must be signed in to change notification settings - Fork 584
perf: calculate grad on-the-fly for SiLUT #4678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the SiLUT gradient computation to reduce memory consumption by calculating the first-order gradient on-the-fly.
- Refactored silut_forward to use a temporary variable renamed from "tanh_part" to "tanh".
- Changed silut_backward to return a single gradient tensor instead of a tuple.
- Updated silut_double_backward and its usage in SiLUTGradFunction to return a tuple with improved on-the-fly gradient calculation.
Comments suppressed due to low confidence (1)
deepmd/pt/utils/utils.py:122
- [nitpick] The variable name 'grad_mul_grad_grad_output' is unclear. Consider renaming it to better reflect its purpose, for example 'grad_second', to improve code readability.
grad_input, grad_mul_grad_grad_output = silut_double_backward_script(
📝 WalkthroughWalkthroughThe changes modify internal gradient computation functions in the Changes
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (30)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Co-authored-by: Duo <50307526+iProzd@users.noreply.github.com> Signed-off-by: Chun Cai <amoycaic@gmail.com>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## devel #4678 +/- ##
==========================================
- Coverage 84.80% 84.80% -0.01%
==========================================
Files 692 692
Lines 66396 66402 +6
Branches 3539 3538 -1
==========================================
+ Hits 56306 56310 +4
- Misses 8949 8951 +2
Partials 1141 1141 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Current implementation of SiLUT involves one extra storage for saving the 1st-order gradient. This PR reduces the memory footprint by calculating the 1st-order gradient on-the-fly in
silut_double_backward. It introduces an overhead of ~0.5% of calculation time.I've tested this PR on OMat with 9 DPA-3 layers and batch size=auto:512.
The correctness of this modification is covered by
source/tests/pt/test_custom_activation.py.Summary by CodeRabbit