-
Notifications
You must be signed in to change notification settings - Fork 584
fix(tests): fix tearDownClass and release GPU memory #4702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn>
📝 WalkthroughWalkthroughThe pull request updates the CI workflow by appending the Changes
Sequence Diagram(s)sequenceDiagram
participant T as Test Case
participant LM as log_gpu_memory Fixture
participant NS as nvidia-smi Process
T ->> LM: Complete test execution
LM ->> NS: Execute `nvidia-smi` command
NS -->> LM: Return GPU memory info or error
LM ->> Console: Print GPU memory usage information
sequenceDiagram
participant TC as Test Class
participant TD as tearDownClass Method
participant PT as PTTestCase.cleanup
TC ->> Test Methods: Execute all test methods
TC ->> TD: Invoke class cleanup (tearDownClass)
TD ->> PT: Call PTTestCase.tearDownClass()
Tip ⚡💬 Agentic Chat (Pro Plan, General Availability)
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (30)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## devel #4702 +/- ##
==========================================
- Coverage 84.82% 84.82% -0.01%
==========================================
Files 692 692
Lines 66512 66513 +1
Branches 3539 3538 -1
==========================================
Hits 56419 56419
- Misses 8952 8953 +1
Partials 1141 1141 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This reverts commit a0faa6e.
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
source/tests/universal/pt/atomc_model/test_atomic_model.py (2)
136-139: Good addition of tearDownClass, but consider explicit cleanup of model instances.The addition of
tearDownClassmethod is a good practice for proper resource cleanup. However, you might want to consider explicitly deleting class attributes likemoduleto ensure thorough memory cleanup, similar to what's done in other backend test classes.@classmethod def tearDownClass(cls) -> None: super().tearDownClass() + if hasattr(cls, "module"): + del cls.module
206-209: Consider consistent cleanup pattern across all test classes.These tearDownClass implementations call the super method but don't explicitly clean up model instances that could be holding onto GPU memory. For more thorough garbage collection, consider adding explicit cleanup of class attributes.
@classmethod def tearDownClass(cls) -> None: super().tearDownClass() + # Clean up model instances to help garbage collection + if hasattr(cls, "module"): + del cls.moduleAlso applies to: 272-275, 338-341, 414-417, 483-486
source/tests/universal/pt/model/test_model.py (2)
214-217: Consider enhancing tearDownClass with explicit module cleanup.The implementation correctly calls the superclass method, but similar to other test classes in this codebase (e.g., in backend.py), you should consider explicitly cleaning up class attributes to help garbage collection.
@classmethod def tearDownClass(cls) -> None: super().tearDownClass() + if hasattr(cls, "module"): + del cls.module + if hasattr(cls, "_script_module"): + del cls._script_module
322-325: Ensure consistent cleanup in all tearDownClass implementations.While adding tearDownClass methods is a step in the right direction for memory management, these implementations could be enhanced to explicitly clean up PyTorch models and tensors. This is especially important for GPU memory management.
Recommended implementation for all tearDownClass methods:
@classmethod def tearDownClass(cls) -> None: super().tearDownClass() + # Clean up PyTorch objects to help garbage collection + if hasattr(cls, "module"): + del cls.module + if hasattr(cls, "_script_module"): + del cls._script_moduleAlso applies to: 421-424, 516-519, 619-622, 748-751, 846-849, 949-952
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
source/tests/universal/pt/atomc_model/test_atomic_model.py(6 hunks)source/tests/universal/pt/descriptor/test_descriptor.py(1 hunks)source/tests/universal/pt/fitting/test_fitting.py(1 hunks)source/tests/universal/pt/loss/test_loss.py(1 hunks)source/tests/universal/pt/model/test_model.py(8 hunks)source/tests/universal/pt/utils/test_type_embed.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (5)
source/tests/universal/pt/descriptor/test_descriptor.py (4)
source/tests/universal/pt/fitting/test_fitting.py (1)
tearDownClass(55-56)source/tests/universal/pt/utils/test_type_embed.py (1)
tearDownClass(27-28)source/tests/universal/pt/atomc_model/test_atomic_model.py (6)
tearDownClass(137-138)tearDownClass(207-208)tearDownClass(273-274)tearDownClass(339-340)tearDownClass(415-416)tearDownClass(484-485)source/tests/universal/pt/model/test_model.py (8)
tearDownClass(215-216)tearDownClass(323-324)tearDownClass(422-423)tearDownClass(517-518)tearDownClass(620-621)tearDownClass(749-750)tearDownClass(847-848)tearDownClass(950-951)
source/tests/universal/pt/fitting/test_fitting.py (4)
source/tests/universal/pt/descriptor/test_descriptor.py (1)
tearDownClass(61-62)source/tests/universal/pt/utils/test_type_embed.py (1)
tearDownClass(27-28)source/tests/universal/pt/atomc_model/test_atomic_model.py (6)
tearDownClass(137-138)tearDownClass(207-208)tearDownClass(273-274)tearDownClass(339-340)tearDownClass(415-416)tearDownClass(484-485)source/tests/universal/pt/model/test_model.py (8)
tearDownClass(215-216)tearDownClass(323-324)tearDownClass(422-423)tearDownClass(517-518)tearDownClass(620-621)tearDownClass(749-750)tearDownClass(847-848)tearDownClass(950-951)
source/tests/universal/pt/loss/test_loss.py (5)
source/tests/universal/pt/descriptor/test_descriptor.py (1)
tearDownClass(61-62)source/tests/universal/pt/fitting/test_fitting.py (1)
tearDownClass(55-56)source/tests/universal/pt/utils/test_type_embed.py (1)
tearDownClass(27-28)source/tests/universal/pt/atomc_model/test_atomic_model.py (6)
tearDownClass(137-138)tearDownClass(207-208)tearDownClass(273-274)tearDownClass(339-340)tearDownClass(415-416)tearDownClass(484-485)source/tests/universal/pt/model/test_model.py (7)
tearDownClass(215-216)tearDownClass(323-324)tearDownClass(422-423)tearDownClass(517-518)tearDownClass(620-621)tearDownClass(749-750)tearDownClass(847-848)
source/tests/universal/pt/model/test_model.py (7)
source/tests/universal/pt/descriptor/test_descriptor.py (1)
tearDownClass(61-62)source/tests/universal/pt/fitting/test_fitting.py (1)
tearDownClass(55-56)source/tests/universal/pt/loss/test_loss.py (1)
tearDownClass(50-51)source/tests/universal/pt/utils/test_type_embed.py (1)
tearDownClass(27-28)source/tests/universal/pt/backend.py (1)
tearDownClass(58-63)source/tests/universal/pt/atomc_model/test_atomic_model.py (6)
tearDownClass(137-138)tearDownClass(207-208)tearDownClass(273-274)tearDownClass(339-340)tearDownClass(415-416)tearDownClass(484-485)source/tests/universal/dpmodel/backend.py (1)
tearDownClass(57-61)
source/tests/universal/pt/atomc_model/test_atomic_model.py (7)
source/tests/universal/pt/descriptor/test_descriptor.py (1)
tearDownClass(61-62)source/tests/universal/pt/fitting/test_fitting.py (1)
tearDownClass(55-56)source/tests/universal/pt/loss/test_loss.py (1)
tearDownClass(50-51)source/tests/universal/pt/utils/test_type_embed.py (1)
tearDownClass(27-28)source/tests/universal/pt/backend.py (1)
tearDownClass(58-63)source/tests/universal/pt/model/test_model.py (8)
tearDownClass(215-216)tearDownClass(323-324)tearDownClass(422-423)tearDownClass(517-518)tearDownClass(620-621)tearDownClass(749-750)tearDownClass(847-848)tearDownClass(950-951)source/tests/universal/dpmodel/backend.py (1)
tearDownClass(57-61)
⏰ Context from checks skipped due to timeout of 90000ms (30)
- GitHub Check: Test Python and C++ on CUDA
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Analyze (python)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Test C++ (false)
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Test C++ (true)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Analyze (c-cpp)
🔇 Additional comments (4)
source/tests/universal/pt/fitting/test_fitting.py (1)
54-56: Good addition for consistent cleanup.Adding the
tearDownClassmethod ensures that the superclass's cleanup procedures are properly called, which is a good practice for resource management, especially for tests involving GPU resources.source/tests/universal/pt/utils/test_type_embed.py (1)
26-28: Good addition for consistent cleanup.The
tearDownClassmethod ensures proper resource cleanup after all tests in this class are completed. This is particularly important for GPU-related tests to prevent memory leaks.source/tests/universal/pt/descriptor/test_descriptor.py (1)
60-62: Good addition for consistent cleanup.Adding the
tearDownClassmethod ensures proper resource cleanup, which aligns with the PR's goal of debugging garbage collection issues.source/tests/universal/pt/loss/test_loss.py (1)
49-51: Good addition for consistent cleanup.This
tearDownClassmethod ensures that resources are properly cleaned up after all tests in the class have run. This is part of the systematic approach to improving garbage collection across the test suite.
This reverts commit a13ad1e.
This reverts commit 4cfa8f5.
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn>
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn>
This reverts commit 435024f.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
source/tests/universal/pt/backend.py(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (30)
- GitHub Check: Test Python and C++ on CUDA
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Analyze (python)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Test C++ (false)
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Test C++ (true)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Build C++ (cpu, cpu)
🔇 Additional comments (1)
source/tests/universal/pt/backend.py (1)
2-2: Added garbage collection import.The addition of the
gcmodule is appropriate for memory management, especially in test cleanup routines.
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn>
This reverts commit 0498d64.
<!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Chores** - Updated the testing command to ensure that console output is displayed during test runs. - **Tests** - Introduced automated logging of GPU memory usage after tests. - Standardized cleanup procedures across multiple test suites for improved resource management. - Adjusted teardown processes for specific test classes to enhance resource management, including new class methods for cleanup. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn> (cherry picked from commit a1777b7)
<!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Chores** - Updated the testing command to ensure that console output is displayed during test runs. - **Tests** - Introduced automated logging of GPU memory usage after tests. - Standardized cleanup procedures across multiple test suites for improved resource management. - Adjusted teardown processes for specific test classes to enhance resource management, including new class methods for cleanup. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn> (cherry picked from commit a1777b7)
Summary by CodeRabbit
Chores
Tests