[Feature] Optimize LoRA Loading Mechanism to Decouple User Limits from CPU Memory Constraints

### Checklist

- [ ] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [ ] 2. Please use English, otherwise it will be closed.

### Motivation

Currently, the `--max-loaded-loras` parameter imposes a hard limit on both:
1. The number of LoRAs that can be loaded into CPU memory
2. The number of LoRAs a user is allowed to load

This creates a suboptimal user experience, as users may be prevented from loading enough LoRAs. 

We hope a more flexible approach where:
- The CPU memory constraint remains (to prevent OOM errors)
- User-facing limits on the number of LoRAs are removed
- LoRAs are dynamically loaded/unloaded from CPU memory based on user requests, within the bounds of available memory

This would allow users to request any number of LoRAs while the system automatically manages memory usage, resulting in a smoother experience without artificial restrictions.

**Expected Behavior:**
- users can submit requests for any number of LoRAs without arbitrary limits
- the `--max-loaded-loras` limits the number of Loras that can be loaded into CPU memory
- When max num Loras is reached, least recently used (or other smart eviction) LoRAs are unloaded to make space with new request.


### Related resources

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Optimize LoRA Loading Mechanism to Decouple User Limits from CPU Memory Constraints #10266

Checklist

Motivation

Related resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Optimize LoRA Loading Mechanism to Decouple User Limits from CPU Memory Constraints #10266

Description

Checklist

Motivation

Related resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions