[model-gateway] Implement Zero-Copy Vision Tensor Access by ppraneth · Pull Request #15750 · sgl-project/sglang

ppraneth · 2025-12-24T11:10:20Z

Motivation

Vision model tensors (e.g., for LLaVA or Qwen-VL) are large, often several megabytes in size. In the previous implementation, the pixel_values_flat method in image_processor.rs performed a linear-time deep copy of the entire vision tensor into a new heap-allocated Vec on every call. For a standard batch of 4 images at 336x336 resolution, this operation allocated and copied ~5.4 MB of data per call.

This created a memory pressure and CPU overhead on the hot path for multimodal request processing, leading to millisecond-scale latencies for a simple data access operation.

The goal of this pull request is to implement zero-copy access to these tensors, eliminating redundant allocations and significantly reducing processing latency for multimodal inputs.

Modifications

Core Logic Change: Updated src/multimodal/vision/image_processor.rs to return std::borrow::Cow<'_, [f32]> instead of Vec<f32>.
Optimization: Leveraged ndarray::as_slice() to check for memory contiguity. If the tensor is contiguous (the case for 100% of standard preprocessor outputs), it now returns a borrowed slice (Cow::Borrowed), bypassing the heap allocator and copy logic entirely.
Compatibility: Because Cow<[f32]> implements Deref, all existing call sites in vision processors (e.g., qwen2_vl.rs, qwen3_vl.rs, phi4_vision.rs, llama4_vision.rs) and integration tests continue to function without modification.

Accuracy Tests

Golden Tests: Verified compatibility by running cargo test --test vision_golden_tests.

Benchmarking and Profiling

The optimization was validated using a benchmark simulating a 4-image batch (Shape: [4, 3, 336, 336], ~5.4 MB).

Metric	Before (Current)	Post (Optimized)	Delta
Mean Latency	5.6564 ms	25.456 ns	~222,000x Speedup
Memory Allocation	~5.4 MB / call	0 MB	100% Reduction
Execution Path	Deep Copy	Reference	Zero-Copy

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

gemini-code-assist · 2025-12-24T11:10:37Z

Summary of Changes

Hello @ppraneth, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the performance of vision tensor processing within the model gateway by implementing a zero-copy access mechanism. By transitioning from deep copies to borrowed slices for contiguous data, it drastically reduces memory allocations and processing latency, ensuring more efficient handling of large multimodal inputs.

Highlights

Performance Optimization for Vision Tensors: Previously, the pixel_values_flat method performed a linear-time deep copy of large vision tensors (e.g., ~5.4 MB for a 4-image batch) on every call. This caused significant memory pressure and CPU overhead on the hot path, leading to millisecond-scale latencies for a simple data access operation.
Zero-Copy Access Implementation: The pixel_values_flat method in src/multimodal/vision/image_processor.rs has been updated to return std::borrow::Cow<'_, [f32]> instead of Vec<f32>. It now leverages ndarray::as_slice() to return a borrowed slice (Cow::Borrowed) if the tensor is contiguous, thereby eliminating redundant allocations and copies.
Significant Performance Gains: Benchmarking validates a ~222,000x speedup in mean latency (from 5.6564 ms to 25.456 ns) and a 100% reduction in memory allocation (from ~5.4 MB/call to 0 MB) for vision tensor access, achieving zero-copy execution.
Backward Compatibility: Because Cow<[f32]> implements Deref, all existing call sites in vision processors (e.g., qwen2_vl.rs, qwen3_vl.rs, phi4_vision.rs, llama4_vision.rs) and integration tests continue to function without modification.
New Benchmark Added: A new benchmark, vision_access_bench, has been added to Cargo.toml and implemented in benches/vision_access_bench.rs to specifically measure and validate the performance improvements of vision tensor access.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This is an excellent optimization that significantly improves performance for multimodal request processing by implementing zero-copy access for vision tensors. The use of std::borrow::Cow is idiomatic and effectively avoids unnecessary memory allocations and copies on the hot path. The change is well-motivated, clearly explained, and includes a benchmark to validate the impressive performance gains. The fallback path for non-contiguous tensors ensures correctness is maintained. Great work!

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

ppraneth · 2025-12-24T17:06:48Z

@slin1237 I haven’t touched schedule_batch.py in this PR.
The lint failure appears to be related to this file, as the issue is also appears in newly opened PRs

slin1237 · 2025-12-24T17:14:53Z

fixed

…#15750)

ppraneth added 3 commits December 24, 2025 15:44

get baseline

cdb00e0

apply fix

d6bea77

apply fix

b8c95aa

ppraneth requested review from CatherineSue and slin1237 as code owners December 24, 2025 11:10

github-actions Bot added dependencies Pull requests that update a dependency file Multi-modal multi-modal language model model-gateway labels Dec 24, 2025

gemini-code-assist Bot reviewed Dec 24, 2025

View reviewed changes

Comment thread sgl-model-gateway/src/multimodal/vision/image_processor.rs

ppraneth and others added 6 commits December 24, 2025 16:44

Update sgl-model-gateway/src/multimodal/vision/image_processor.rs

158a6e8

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

final fix

15f1ae3

Merge branch 'main' into vision

e25ecc6

Merge branch 'main' into vision

cc42966

remove bench file

aadacc9

Merge branch 'main' into vision

ae4875c

slin1237 added the run-ci label Dec 24, 2025

Merge branch 'main' into vision

78d9e2c

slin1237 merged commit 370bd27 into sgl-project:main Dec 24, 2025
56 of 60 checks passed

jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025

[model-gateway] Implement Zero-Copy Vision Tensor Access (sgl-project…

601d02a

…#15750)

Leoyzen pushed a commit to Leoyzen/sglang that referenced this pull request Dec 25, 2025

[model-gateway] Implement Zero-Copy Vision Tensor Access (sgl-project…

663cb2a

…#15750)

Leoyzen pushed a commit to Leoyzen/sglang that referenced this pull request Dec 25, 2025

[model-gateway] Implement Zero-Copy Vision Tensor Access (sgl-project…

a71c5d1

…#15750)

YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026

[model-gateway] Implement Zero-Copy Vision Tensor Access (sgl-project…

e3da3f2

…#15750)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model-gateway] Implement Zero-Copy Vision Tensor Access #15750

[model-gateway] Implement Zero-Copy Vision Tensor Access #15750
slin1237 merged 10 commits intosgl-project:mainfrom
ppraneth:vision

ppraneth commented Dec 24, 2025

Uh oh!

gemini-code-assist Bot commented Dec 24, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

ppraneth commented Dec 24, 2025

Uh oh!

slin1237 commented Dec 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ppraneth commented Dec 24, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented Dec 24, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ppraneth commented Dec 24, 2025

Uh oh!

slin1237 commented Dec 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants