[Feature] Asynchronous LoRA prefetch

### Checklist

- [ ] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [ ] 2. Please use English, otherwise it will be closed.

### Motivation

Currently the main overheads for LoRA performance is coming from the process of loading adapters from CPU to GPU memory. While we have made several efforts optimizing this process in H1, this process itself is still synchronous and significantly slows down LoRA requests. 

One possible solution is to implement some sorts of zero-overhead scheduling for LoRA, such that the prefetch process can be hidden.

### Related resources

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Asynchronous LoRA prefetch #8712

Checklist

Motivation

Related resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Asynchronous LoRA prefetch #8712

Description

Checklist

Motivation

Related resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions