[smg][ci] Add thread safety to ModelPool and GPUAllocator by slin1237 · Pull Request #16674 · sgl-project/sglang

slin1237 · 2026-01-07T16:39:21Z

GPUAllocator:

Add RLock to protect _used_gpus and slots
Wrap allocate_slots, release_gpus, available_gpus, get_slot_for_model, summary

ModelPool:

Add RLock to protect instances dict
Wrap startup, get, get_workers_by_type, launch_workers, shutdown
Split public methods into wrapper + unlocked internal methods

This enables safe parallel test execution where multiple tests can concurrently access the model pool without race conditions.

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments (/tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci) or contact authorized users to do so.
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-01-07T16:39:37Z

Summary of Changes

Hello @slin1237, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness of the system by implementing comprehensive thread safety mechanisms within the GPUAllocator and ModelPool components. The primary goal is to eliminate potential race conditions that could arise from concurrent access, particularly in multi-threaded environments like parallel test execution. By protecting critical shared data structures and operations with reentrant locks, the changes ensure data integrity and predictable behavior, paving the way for more reliable and efficient testing workflows.

Highlights

GPUAllocator Thread Safety: Introduced a threading.RLock to the GPUAllocator class to protect shared resources like _used_gpus and slots. Key methods such as allocate_slots, release_gpus, available_gpus, get_slot_for_model, and summary are now wrapped with this lock to ensure thread-safe operations.
ModelPool Thread Safety: Added a threading.RLock to the ModelPool class to safeguard the instances dictionary. Public methods including startup, get, get_workers_by_type, launch_workers, and shutdown have been updated to acquire this lock, with their core logic moved into new _unlocked internal methods.
Enabled Parallel Test Execution: The implemented thread safety measures in both GPUAllocator and ModelPool are designed to prevent race conditions, thereby enabling safe and concurrent access to the model pool during parallel test execution.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces thread safety to ModelPool and GPUAllocator by using threading.RLock to protect shared state. The approach of refactoring public methods into a locked wrapper and an unlocked internal method is sound and consistently applied. The changes are generally well-implemented. However, I've identified a potential race condition in get() and get_workers_by_type() concerning instance acquisition. An instance returned by these methods could be evicted by another thread before the caller can acquire it. I have provided specific comments and suggestions to address this by ensuring instances are acquired while the pool lock is held, which is crucial for robust thread safety.

GPUAllocator: - Add RLock to protect _used_gpus and slots - Wrap allocate_slots, release_gpus, available_gpus, get_slot_for_model, summary ModelPool: - Add RLock to protect instances dict - Wrap startup, get, get_workers_by_type, launch_workers, shutdown - Split public methods into wrapper + unlocked internal methods This enables safe parallel test execution where multiple tests can concurrently access the model pool without race conditions.

When pytest-parallel runs multiple test threads, they would all simultaneously try to create their own model_pool because the check `if _model_pool is not None` was not thread-safe. This caused CUDA OOM errors as multiple workers tried to allocate the same GPUs. Fix: Add threading.Lock() around the entire fixture initialization to ensure only one thread creates the pool while others wait. Fix pytest-parallel compatibility: add py dependency pytest-parallel requires the 'py' package for py.log.Producer, but newer pytest versions (7.2.0+) no longer include it as a dependency. References: - pytest-dev/pytest#10424 - kevlened/pytest-parallel#118 Add parallel test execution support with pytest-parallel Enable thread-based parallelism for e2e tests using pytest-parallel. Tests run concurrently as threads within a single process, sharing the session-scoped model_pool fixture. Infrastructure changes: - Add pytest-parallel dependency to e2e_test/pyproject.toml - Add thread_unsafe marker for incompatible tests - Add is_parallel_execution() helper and pytest_runtest_setup hook to auto-skip thread_unsafe tests in parallel mode - Update conftest.py with parallel execution documentation Workflow changes: - Add parallel_opts matrix variable to pr-test-rust.yml - Enable --workers 1 --tests-per-worker 4 for e2e test suite - Keep benchmarks and legacy tests sequential Bugfix: - Fix NameError in _setup_pd_backend: use 'prefills + decodes' instead of undefined 'all_workers' in finally block

With pytest-parallel running multiple threads, request.addfinalizer was firing prematurely when one thread's test finished, shutting down the model pool while other threads were still using it. Fix: Use atexit.register() instead of request.addfinalizer() to ensure the pool shutdown only happens at process exit, after all tests complete.

The pytest-parallel plugin forks a worker process after test collection. If CUDA is initialized before the fork (during collection), all CUDA operations in the forked worker fail with "Cannot re-initialize CUDA". Fix: Use nvidia-smi instead of torch.cuda for GPU counting in validate_gpu_requirements(). This avoids CUDA initialization during the collection phase.

The class-scoped fixture was causing "Cannot copy out of meta tensor" errors when multiple threads tried to load SentenceTransformer simultaneously. Changed to session-scoped with thread-safe initialization using a lock and global cache, matching the pattern used for model_pool.

When tests run in parallel and all GPUs are occupied by workers that are in use by other tests, launch_workers now waits (up to 5 min) for GPUs to become available instead of failing immediately. The key changes: - _launch_workers_unlocked now returns None when GPUs unavailable after eviction (all workers in use), signaling retry is needed - launch_workers has a polling loop that releases the lock while waiting, allowing other tests to release their workers This fixes PD tests being skipped when run in parallel with other tests that temporarily hold all available GPUs.

slin1237 · 2026-01-07T19:06:16Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for parallel test execution using pytest-parallel by making the ModelPool and GPUAllocator classes thread-safe. Key changes include adding threading.Lock or threading.RLock to ModelPool and GPUAllocator methods to protect shared state, modifying fixtures like hf_reference_embeddings and model_pool to be session-scoped and thread-safe, and updating GPU detection to use nvidia-smi to avoid CUDA initialization issues in parallel environments. A new thread_unsafe pytest marker was added to skip tests incompatible with parallel execution, and model_pool.get() and get_workers_by_type() now auto-acquire model instances. The reviewer suggested refactoring the is_parallel_execution function to use the pytest.Config object instead of parsing sys.argv for robustness, and updating its call site accordingly.

- Add logging at start of _setup_pd_backend to track execution - Add pytest.fail() when launch_workers returns empty list - Add guard for incomplete PD setup (missing prefill or decode workers) - Better error messages for debugging GPU allocation failures

Add logging to every early check in _setup_pd_backend to diagnose why PD tests are being skipped. Now logs: - sgl_kernel import success/failure - torch import success/failure - CUDA availability - GPU count vs requirements This will show exactly which check is causing the skip.

PD workers are no different from embedding workers - they just need to be launched. The early checks (sgl_kernel, torch, cuda, gpu count) were unnecessary and causing skips. Embedding workers work without these checks - they just get launched on-demand and fail properly if something's wrong. PD should work the same way.

Use pytest's config.option instead of parsing sys.argv, which is fragile and doesn't handle options from pytest.ini/pyproject.toml.

Make get() wait for GPUs to become available instead of failing immediately when all GPUs are in use by other tests: - Add wait_for_gpus and gpu_wait_timeout parameters to get() - Change _get_unlocked() to return None when GPUs unavailable (signals retry) - Change _ensure_gpu_available() to return bool instead of raising - Add polling loop in get() that releases lock while waiting This matches the existing wait mechanism in launch_workers() and allows parallel tests to share GPUs by waiting for each other.

Include %(threadName)s in log format so logs from different parallel test threads can be distinguished. Shows 'MainThread' for sequential execution and 'Thread-N' for parallel workers.

slin1237 requested review from CatherineSue and key4ng as code owners January 7, 2026 16:39

github-actions Bot added the model-gateway label Jan 7, 2026

slin1237 added the run-ci label Jan 7, 2026

gemini-code-assist Bot reviewed Jan 7, 2026

View reviewed changes

Comment thread sgl-model-gateway/e2e_test/infra/model_pool.py Outdated

Comment thread sgl-model-gateway/e2e_test/infra/model_pool.py

slin1237 force-pushed the smg-ci-n/20 branch from 1a6be50 to bdb5836 Compare January 7, 2026 17:09

slin1237 force-pushed the smg-ci-n/20 branch from bdb5836 to 999d525 Compare January 7, 2026 17:11

slin1237 requested review from Fridge003, Kangyan-Zhou, ispobock and merrymercy as code owners January 7, 2026 17:27

github-actions Bot added the dependencies Pull requests that update a dependency file label Jan 7, 2026

slin1237 force-pushed the smg-ci-n/20 branch from 11695ae to c0caa9a Compare January 7, 2026 17:57

slin1237 force-pushed the smg-ci-n/20 branch from c0caa9a to c23e354 Compare January 7, 2026 17:57

slin1237 added 4 commits January 7, 2026 10:07

gemini-code-assist Bot reviewed Jan 7, 2026

View reviewed changes

Comment thread sgl-model-gateway/e2e_test/fixtures/hooks.py Outdated

Comment thread sgl-model-gateway/e2e_test/fixtures/hooks.py Outdated

slin1237 added 6 commits January 7, 2026 11:18

refactor is_parallel_execution to use pytest config

69eb245

Use pytest's config.option instead of parsing sys.argv, which is fragile and doesn't handle options from pytest.ini/pyproject.toml.

Add thread name to log format for parallel execution readability

35bdc55

Include %(threadName)s in log format so logs from different parallel test threads can be distinguished. Shows 'MainThread' for sequential execution and 'Thread-N' for parallel workers.

slin1237 merged commit 6037267 into main Jan 7, 2026
88 of 90 checks passed

slin1237 deleted the smg-ci-n/20 branch January 7, 2026 21:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[smg][ci] Add thread safety to ModelPool and GPUAllocator#16674

[smg][ci] Add thread safety to ModelPool and GPUAllocator#16674
slin1237 merged 12 commits intomainfrom
smg-ci-n/20

slin1237 commented Jan 7, 2026

Uh oh!

gemini-code-assist Bot commented Jan 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

slin1237 commented Jan 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

slin1237 commented Jan 7, 2026

Checklist

Review Process

Uh oh!

gemini-code-assist Bot commented Jan 7, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

slin1237 commented Jan 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant