Add reference counting to ModelInstance for parallel test safety by slin1237 · Pull Request #16672 · sgl-project/sglang

slin1237 · 2026-01-07T16:02:23Z

Add _ref_count and _ref_lock fields to ModelInstance for thread-safe reference tracking
Add acquire(), release() methods and is_in_use property
Update _evict_for_gpus to skip instances with active references
Update fixtures to acquire refs before using instances and release in finally blocks
Prevents worker eviction while tests are actively using them

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments (/tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci) or contact authorized users to do so.
After green CI and required approvals, ask Merge Oncalls to merge.

- Add _ref_count and _ref_lock fields to ModelInstance for thread-safe reference tracking - Add acquire(), release() methods and is_in_use property - Update _evict_for_gpus to skip instances with active references - Update fixtures to acquire refs before using instances and release in finally blocks - Prevents worker eviction while tests are actively using them

gemini-code-assist · 2026-01-07T16:02:44Z

Summary of Changes

Hello @slin1237, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust reference counting system for ModelInstance objects to ensure thread safety during parallel test execution. By preventing model instances from being evicted while actively in use, the changes significantly improve the reliability and stability of the test suite, particularly in environments where resources might be dynamically managed.

Highlights

Reference Counting Implementation: Introduced a thread-safe reference counting mechanism to ModelInstance objects by adding _ref_count and _ref_lock fields, along with acquire(), release(), and is_in_use methods/properties.
Eviction Prevention: Modified the _evict_for_gpus logic to prevent the premature eviction of ModelInstance objects that are currently in use (i.e., have active references from running tests).
Test Fixture Integration: Updated various test fixtures (model_client, model_base_url, _setup_pd_backend, _setup_local_backend, test_router_state) to correctly acquire references to ModelInstance objects before use and ensure their release in finally blocks, enhancing test stability during parallel execution.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces reference counting to ModelInstance to prevent model workers from being evicted while they are in use by a test, which is a crucial step for enabling parallel test execution. The implementation of acquire(), release(), and is_in_use is thread-safe and correctly integrated into the eviction logic. The test fixtures have also been updated to correctly use this new mechanism.

However, I've identified a critical issue regarding the overall thread-safety of the ModelPool and GPUAllocator classes. They currently lack the necessary locking to protect their internal state (like the list of instances and GPU allocations) from race conditions during concurrent access. This could lead to errors or incorrect behavior when tests are run in parallel. I've left a critical comment with details on this. I've also included a medium-severity suggestion to improve the thread-safety of updating the last_used timestamp on ModelInstance.

Overall, this is a good change, but addressing the thread-safety of the pool management is essential to fully achieve the goal of parallel test safety.

gemini-code-assist · 2026-01-07T16:06:30Z

@@ -673,6 +714,12 @@ def _evict_for_gpus(
        # Store (dict_key, instance) tuples to preserve the actual key for eviction
        evictable: list[tuple[str, ModelInstance]] = []
        for dict_key, inst in self.instances.items():


The iteration over self.instances.items() is not thread-safe. If another thread modifies self.instances (e.g., by calling get() which triggers an eviction), this could raise a RuntimeError: dictionary changed size during iteration.

This is part of a larger issue: the ModelPool class and the GPUAllocator it uses are not thread-safe, which is critical for achieving "parallel test safety".

Specifically:

ModelPool.instances is not protected: Concurrent modifications from methods like get(), _evict_instance(), and _launch_model() can lead to race conditions.

GPUAllocator is not thread-safe: GPUAllocator.allocate_slots() has a race condition where multiple threads could be allocated the same GPU, as the check for available GPUs and the update to used GPUs are not atomic.

To properly ensure thread safety, you should introduce locks in both ModelPool (to protect self.instances) and GPUAllocator (to protect self._used_gpus). Be mindful of lock granularity to avoid deadlocks or performance issues, especially in long-running methods.

gemini-code-assist · 2026-01-07T16:06:30Z

+        Must be paired with a release() call when done.
+        """
+        with self._ref_lock:
+            self._ref_count += 1


The last_used timestamp is currently updated in ModelPool.get() without a lock, which is not thread-safe. A better and safer approach is to update last_used here within the acquire method, as it's already protected by _ref_lock. This ensures that last_used is updated atomically with the reference count.

After this change, you should remove the manual update of instance.last_used from ModelPool.get().

self._ref_count += 1 self.last_used = time.time()

- Move last_used update into acquire() where it's protected by _ref_lock - Remove redundant last_used update from ModelPool.get() - Use list() copy when iterating self.instances to avoid RuntimeError if dictionary is modified during iteration

slin1237 requested review from CatherineSue and key4ng as code owners January 7, 2026 16:02

github-actions Bot added the model-gateway label Jan 7, 2026

slin1237 added the run-ci label Jan 7, 2026

gemini-code-assist Bot reviewed Jan 7, 2026

View reviewed changes

slin1237 merged commit 7385834 into main Jan 7, 2026
6 of 16 checks passed

slin1237 deleted the smg-ci-n/19 branch January 7, 2026 16:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add reference counting to ModelInstance for parallel test safety#16672

Add reference counting to ModelInstance for parallel test safety#16672
slin1237 merged 2 commits intomainfrom
smg-ci-n/19

slin1237 commented Jan 7, 2026

Uh oh!

gemini-code-assist Bot commented Jan 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jan 7, 2026

Uh oh!

gemini-code-assist Bot Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

slin1237 commented Jan 7, 2026

Checklist

Review Process

Uh oh!

gemini-code-assist Bot commented Jan 7, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant