Skip to content

[model-gateway] Implement RAII load guard with response body attachment#15507

Merged
slin1237 merged 1 commit intomainfrom
cleanup-2
Dec 20, 2025
Merged

[model-gateway] Implement RAII load guard with response body attachment#15507
slin1237 merged 1 commit intomainfrom
cleanup-2

Conversation

@slin1237
Copy link
Copy Markdown
Collaborator

  • Add WorkerLoadGuard::attach_to_response() to tie guard lifetime to response body
  • Add attach_guards_to_response() for multiple guards (dual prefill/decode workers)
  • Implement GuardedBody and MultiGuardedBody wrappers using http_body::Body trait
  • Add LoadGuards::attach_to_response() convenience method in gRPC context
  • Refactor streaming processors to remove load_guards parameter
  • Update HTTP and gRPC routers to use attach pattern instead of manual drop
  • Add unit tests for RAII pattern verification

This ensures load guards are properly dropped when:

  • Response body is fully consumed
  • Response body is dropped (client disconnect)
  • Multiple guards are attached (dual prefill/decode workers)

Checklist

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@slin1237 slin1237 force-pushed the cleanup-2 branch 2 times, most recently from c563d81 to 0799de3 Compare December 20, 2025 02:32
@slin1237
Copy link
Copy Markdown
Collaborator Author

/gemini review

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

- Add WorkerLoadGuard::attach_to_response() to tie guard lifetime to response body
- Add attach_guards_to_response() for multiple guards (dual prefill/decode workers)
- Implement GuardedBody and MultiGuardedBody wrappers using http_body::Body trait
- Add LoadGuards::attach_to_response() convenience method in gRPC context
- Refactor streaming processors to remove load_guards parameter
- Update HTTP and gRPC routers to use attach pattern instead of manual drop
- Add unit tests for RAII pattern verification

This ensures load guards are properly dropped when:
- Response body is fully consumed
- Response body is dropped (client disconnect)
- Multiple guards are attached (dual prefill/decode workers)
@slin1237 slin1237 merged commit 5529ab5 into main Dec 20, 2025
55 of 58 checks passed
@slin1237 slin1237 deleted the cleanup-2 branch December 20, 2025 03:14
@junliu-mde
Copy link
Copy Markdown
Contributor

Is there a release / docker image publish recently? I met some load unbalanced issue, if no release in these days I may need to build from source.

Prozac614 pushed a commit to Prozac614/sglang that referenced this pull request Dec 23, 2025
jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025
YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants