[piecewise] Refactor VLM to support input embed buffer and remove external embedder hack#14155
Merged
[piecewise] Refactor VLM to support input embed buffer and remove external embedder hack#14155
Conversation
c3ab849 to
7f524e3
Compare
hebiao064
approved these changes
Dec 1, 2025
Collaborator
|
Thanks for the refactor. LGTM. |
yuan-luo
approved these changes
Dec 1, 2025
yhyang201
approved these changes
Dec 1, 2025
Collaborator
|
/tag-and-rerun-ci |
6 tasks
harvenstar
pushed a commit
to harvenstar/sglang
that referenced
this pull request
Dec 4, 2025
…ernal embedder hack (sgl-project#14155)
tonyluj
pushed a commit
to openanolis/sglang
that referenced
this pull request
Dec 5, 2025
…ernal embedder hack (sgl-project#14155)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The current way to enable PCG for VLM is hacky and requires us to change all VLM model files. This PR implements a more elegant way which does not need to change model files without perf drop.
Modifications
use_original_ca_commwhich sets to the original ca comm because we only need to disble ca in the language modelAccuracy Tests
Follow #13055. No acc drops.
Benchmarking and Profiling
On H100
Follow #13055. No perf drops after
use_original_ca_commfix.This PR
Main
Profile
We can see the modules are still compiled and captured correctly.
Checklist