Add reference counting to ModelInstance for parallel test safety#16672
Add reference counting to ModelInstance for parallel test safety#16672
Conversation
- Add _ref_count and _ref_lock fields to ModelInstance for thread-safe reference tracking - Add acquire(), release() methods and is_in_use property - Update _evict_for_gpus to skip instances with active references - Update fixtures to acquire refs before using instances and release in finally blocks - Prevents worker eviction while tests are actively using them
Summary of ChangesHello @slin1237, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a robust reference counting system for Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces reference counting to ModelInstance to prevent model workers from being evicted while they are in use by a test, which is a crucial step for enabling parallel test execution. The implementation of acquire(), release(), and is_in_use is thread-safe and correctly integrated into the eviction logic. The test fixtures have also been updated to correctly use this new mechanism.
However, I've identified a critical issue regarding the overall thread-safety of the ModelPool and GPUAllocator classes. They currently lack the necessary locking to protect their internal state (like the list of instances and GPU allocations) from race conditions during concurrent access. This could lead to errors or incorrect behavior when tests are run in parallel. I've left a critical comment with details on this. I've also included a medium-severity suggestion to improve the thread-safety of updating the last_used timestamp on ModelInstance.
Overall, this is a good change, but addressing the thread-safety of the pool management is essential to fully achieve the goal of parallel test safety.
| @@ -673,6 +714,12 @@ def _evict_for_gpus( | |||
| # Store (dict_key, instance) tuples to preserve the actual key for eviction | |||
| evictable: list[tuple[str, ModelInstance]] = [] | |||
| for dict_key, inst in self.instances.items(): | |||
There was a problem hiding this comment.
The iteration over self.instances.items() is not thread-safe. If another thread modifies self.instances (e.g., by calling get() which triggers an eviction), this could raise a RuntimeError: dictionary changed size during iteration.
This is part of a larger issue: the ModelPool class and the GPUAllocator it uses are not thread-safe, which is critical for achieving "parallel test safety".
Specifically:
ModelPool.instancesis not protected: Concurrent modifications from methods likeget(),_evict_instance(), and_launch_model()can lead to race conditions.GPUAllocatoris not thread-safe:GPUAllocator.allocate_slots()has a race condition where multiple threads could be allocated the same GPU, as the check for available GPUs and the update to used GPUs are not atomic.
To properly ensure thread safety, you should introduce locks in both ModelPool (to protect self.instances) and GPUAllocator (to protect self._used_gpus). Be mindful of lock granularity to avoid deadlocks or performance issues, especially in long-running methods.
| Must be paired with a release() call when done. | ||
| """ | ||
| with self._ref_lock: | ||
| self._ref_count += 1 |
There was a problem hiding this comment.
The last_used timestamp is currently updated in ModelPool.get() without a lock, which is not thread-safe. A better and safer approach is to update last_used here within the acquire method, as it's already protected by _ref_lock. This ensures that last_used is updated atomically with the reference count.
After this change, you should remove the manual update of instance.last_used from ModelPool.get().
self._ref_count += 1
self.last_used = time.time()- Move last_used update into acquire() where it's protected by _ref_lock - Remove redundant last_used update from ModelPool.get() - Use list() copy when iterating self.instances to avoid RuntimeError if dictionary is modified during iteration
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci) or contact authorized users to do so.