Skip to content

[model-gateway] Add model scope support and LRU eviction for GPU-constrained environments#16525

Merged
slin1237 merged 2 commits intomainfrom
smg-ci-n/10
Jan 6, 2026
Merged

[model-gateway] Add model scope support and LRU eviction for GPU-constrained environments#16525
slin1237 merged 2 commits intomainfrom
smg-ci-n/10

Conversation

@slin1237
Copy link
Copy Markdown
Collaborator

@slin1237 slin1237 commented Jan 5, 2026

Adds session/class scope support for models to enable efficient GPU resource management when running tests with more models than available GPUs.

Key changes:

  • Add @pytest.mark.model(name, scope="session|class") marker support
  • Session-scoped models are pre-launched at startup (default)
  • Class-scoped models are launched on-demand when needed
  • Add LRU eviction: when GPUs are full, least recently used models are evicted
  • Evicted models are queued for re-launch when needed again
  • Add Gateway class for unified gateway lifecycle management
  • Add worker management API tests (IGW mode)

This enables CI environments with limited GPUs (e.g., 4 GPUs, 6 models):

  1. Pre-launch what fits (a, b, c, d)
  2. Queue overflow (e, f)
  3. On-demand: when test needs 'e', evict LRU and launch 'e'

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments (/tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci) or contact authorized users to do so.
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…trained environments

Adds session/class scope support for models to enable efficient GPU resource
management when running tests with more models than available GPUs.

Key changes:
- Add @pytest.mark.model(name, scope="session|class") marker support
- Session-scoped models are pre-launched at startup (default)
- Class-scoped models are launched on-demand when needed
- Add LRU eviction: when GPUs are full, least recently used models are evicted
- Evicted models are queued for re-launch when needed again
- Add Gateway class for unified gateway lifecycle management
- Add worker management API tests (IGW mode)

This enables CI environments with limited GPUs (e.g., 4 GPUs, 6 models):
1. Pre-launch what fits (a, b, c, d)
2. Queue overflow (e, f)
3. On-demand: when test needs 'e', evict LRU and launch 'e'
@slin1237 slin1237 force-pushed the smg-ci-n/10 branch 5 times, most recently from df325d1 to 48b8349 Compare January 6, 2026 01:55
@slin1237 slin1237 merged commit 402a0bd into main Jan 6, 2026
58 of 61 checks passed
@slin1237 slin1237 deleted the smg-ci-n/10 branch January 6, 2026 02:28
jamesjxliu pushed a commit to jamesjxliu/sglang that referenced this pull request Jan 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant