Skip to content

Add reference counting to ModelInstance for parallel test safety#16672

Merged
slin1237 merged 2 commits intomainfrom
smg-ci-n/19
Jan 7, 2026
Merged

Add reference counting to ModelInstance for parallel test safety#16672
slin1237 merged 2 commits intomainfrom
smg-ci-n/19

Conversation

@slin1237
Copy link
Copy Markdown
Collaborator

@slin1237 slin1237 commented Jan 7, 2026

  • Add _ref_count and _ref_lock fields to ModelInstance for thread-safe reference tracking
  • Add acquire(), release() methods and is_in_use property
  • Update _evict_for_gpus to skip instances with active references
  • Update fixtures to acquire refs before using instances and release in finally blocks
  • Prevents worker eviction while tests are actively using them

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments (/tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci) or contact authorized users to do so.
  4. After green CI and required approvals, ask Merge Oncalls to merge.

- Add _ref_count and _ref_lock fields to ModelInstance for thread-safe reference tracking
- Add acquire(), release() methods and is_in_use property
- Update _evict_for_gpus to skip instances with active references
- Update fixtures to acquire refs before using instances and release in finally blocks
- Prevents worker eviction while tests are actively using them
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @slin1237, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust reference counting system for ModelInstance objects to ensure thread safety during parallel test execution. By preventing model instances from being evicted while actively in use, the changes significantly improve the reliability and stability of the test suite, particularly in environments where resources might be dynamically managed.

Highlights

  • Reference Counting Implementation: Introduced a thread-safe reference counting mechanism to ModelInstance objects by adding _ref_count and _ref_lock fields, along with acquire(), release(), and is_in_use methods/properties.
  • Eviction Prevention: Modified the _evict_for_gpus logic to prevent the premature eviction of ModelInstance objects that are currently in use (i.e., have active references from running tests).
  • Test Fixture Integration: Updated various test fixtures (model_client, model_base_url, _setup_pd_backend, _setup_local_backend, test_router_state) to correctly acquire references to ModelInstance objects before use and ensure their release in finally blocks, enhancing test stability during parallel execution.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces reference counting to ModelInstance to prevent model workers from being evicted while they are in use by a test, which is a crucial step for enabling parallel test execution. The implementation of acquire(), release(), and is_in_use is thread-safe and correctly integrated into the eviction logic. The test fixtures have also been updated to correctly use this new mechanism.

However, I've identified a critical issue regarding the overall thread-safety of the ModelPool and GPUAllocator classes. They currently lack the necessary locking to protect their internal state (like the list of instances and GPU allocations) from race conditions during concurrent access. This could lead to errors or incorrect behavior when tests are run in parallel. I've left a critical comment with details on this. I've also included a medium-severity suggestion to improve the thread-safety of updating the last_used timestamp on ModelInstance.

Overall, this is a good change, but addressing the thread-safety of the pool management is essential to fully achieve the goal of parallel test safety.

@@ -673,6 +714,12 @@ def _evict_for_gpus(
# Store (dict_key, instance) tuples to preserve the actual key for eviction
evictable: list[tuple[str, ModelInstance]] = []
for dict_key, inst in self.instances.items():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The iteration over self.instances.items() is not thread-safe. If another thread modifies self.instances (e.g., by calling get() which triggers an eviction), this could raise a RuntimeError: dictionary changed size during iteration.

This is part of a larger issue: the ModelPool class and the GPUAllocator it uses are not thread-safe, which is critical for achieving "parallel test safety".

Specifically:

  • ModelPool.instances is not protected: Concurrent modifications from methods like get(), _evict_instance(), and _launch_model() can lead to race conditions.
  • GPUAllocator is not thread-safe: GPUAllocator.allocate_slots() has a race condition where multiple threads could be allocated the same GPU, as the check for available GPUs and the update to used GPUs are not atomic.

To properly ensure thread safety, you should introduce locks in both ModelPool (to protect self.instances) and GPUAllocator (to protect self._used_gpus). Be mindful of lock granularity to avoid deadlocks or performance issues, especially in long-running methods.

Must be paired with a release() call when done.
"""
with self._ref_lock:
self._ref_count += 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The last_used timestamp is currently updated in ModelPool.get() without a lock, which is not thread-safe. A better and safer approach is to update last_used here within the acquire method, as it's already protected by _ref_lock. This ensures that last_used is updated atomically with the reference count.

After this change, you should remove the manual update of instance.last_used from ModelPool.get().

            self._ref_count += 1
            self.last_used = time.time()

- Move last_used update into acquire() where it's protected by _ref_lock
- Remove redundant last_used update from ModelPool.get()
- Use list() copy when iterating self.instances to avoid RuntimeError
  if dictionary is modified during iteration
@slin1237 slin1237 merged commit 7385834 into main Jan 7, 2026
6 of 16 checks passed
@slin1237 slin1237 deleted the smg-ci-n/19 branch January 7, 2026 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant