[Bug] LoRA buffer eviction does not correctly handle adapters with different target weights

### Checklist

- [ ] 1. I have searched related issues but cannot get the expected help.
- [ ] 2. The bug has not been fixed in the latest version.
- [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [ ] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [ ] 5. Please use English, otherwise it will be closed.

### Describe the bug

In the [load_lora_weight_to_buffer ](https://github.com/sgl-project/sglang/blob/9edf6608c9d299e126cc65634ee368d2fc52b0ad/python/sglang/srt/lora/mem_pool.py#L164C9-L168C19) function, we zero out `A_buffer` when `uid == None` ([code reference](https://github.com/sgl-project/sglang/blob/9edf6608c9d299e126cc65634ee368d2fc52b0ad/python/sglang/srt/lora/mem_pool.py#L164C1-L168C19)) to prevent leftover weights of the previously evicted LoRA adapters from interfering with subsequent computations.

However, I suspect we should do the same even when `uid != None`, because in theory different adapters could target different modules (e.g., some adapters do not target k_proj). Our code might not be able to handle this case correctly, for example, if we have two adapters: lora1 targets k_proj, lora2 does not. If lora2 is reusing the memory buffer left by lora1 after its eviction, the k_proj weight of lora1 would remain in the buffer and potentially contaminate the computation of lora2. I discussed this with @Fridge003 and @Qiaolin-Yu offline and they have the same suspicion. 

As this is a rare corner case, I have not got a chance to construct a test to verify it. I am creating this issue to track this potential bug. We need to:
1. **verify**: construct a test case to repro the issue, e.g., setting `max-loras-per-batch = 1` but have 2 adapters with different target weights.
2. **fix**: always zero out buffer during gpu buffer eviction.
3. **benchmark**: verify perf overheads introduced by the zero-out operation. 

### Reproduction

See first comment.

### Environment

Bug is environment agnostic

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] LoRA buffer eviction does not correctly handle adapters with different target weights #7426

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] LoRA buffer eviction does not correctly handle adapters with different target weights #7426

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions