[model-gateway] Optimize WASM Runtime with Instance Pooling and Component Caching by ppraneth · Pull Request #15515 · sgl-project/sglang

ppraneth · 2025-12-20T06:12:03Z

Motivation

I identified a significant per-request overhead in the current WASM middleware implementation within sgl-model-gateway, which acts as a bottleneck for high-throughput serving.

The two primary performance issues addressed in this PR are:

Memory Allocation Overhead: The runtime currently allocates a new wasmtime::Store and linear memory (via mmap) for every single request.
Compilation Overhead: The WASM component is re-compiled from raw bytes (JIT) on every request inside the worker loop.

These operations add milliseconds of latency to every request. This PR introduces Instance Pooling to reuse memory slots and LRU Component Caching to skip redundant compilation, ensuring middleware execution remains near-zero cost.

Modifications

I updated sgl-model-gateway/src/wasm/runtime.rs to implement the following optimizations:

Instance Pooling:
- Integrated wasmtime::PoolingAllocationConfig into the worker loop.
- The system now pre-allocates memory slots (configured to 20 per worker thread) to avoid expensive OS memory allocation calls during request processing.
- Aligned memory limits (max_memory_size, max_component_instance_size) with the new pooling strategy.
Smart Component Caching (LRU):
- Replaced the naive HashMap strategy with a Least Recently Used (LRU) Cache (using the lru crate).
- This prevents "cache stampedes" (where clearing a full cache causes a sudden latency spike) by gracefully evicting only the oldest unused modules when the limit is reached.
- Optimized memory ownership to avoid unnecessary cloning of large WASM binaries during cache insertion.

Benchmarking and Profiling

I performed a local micro-benchmark simulating 1000 sequential requests to measure the full impact of the Instance Pooling + Caching strategy.

Benchmark Configuration:

Scenario: Full request pipeline simulation (Compilation check + Instantiation).
Module: Simple WASM module requiring 1 Memory Page.
Iterations: 1000.

Local Results:

Metric	Standard (Baseline)	Pooled + Cached (Optimized)	Speedup
Total Time	370.95ms	6.27ms	59.17x
Avg Latency	370.95 µs	6.27 µs	98% Reduction

The baseline demonstrates the severe cost of re-compiling modules and re-allocating memory on every request (~370µs/op). The optimized pipeline reduces this to a negligible ~6µs/op by leveraging the pre-warmed cache and pre-allocated memory pool.

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

gemini-code-assist · 2025-12-20T06:12:07Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

gemini-code-assist

Code Review

This pull request introduces significant performance optimizations to the WASM runtime by implementing instance pooling and component caching. These are excellent improvements that will reduce per-request overhead and latency. The code is well-structured and the changes are clearly explained. I have a couple of suggestions to further enhance the implementation: one is a minor optimization to avoid an unnecessary data clone during caching, and the other is a recommendation for a more robust cache eviction strategy to handle high-load scenarios more gracefully. Overall, this is a very valuable contribution.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

slin1237

The Unit tests should be moved to wasm unit test and define a threshold
and clean up all the prints
otherwise, code LGTM

ppraneth · 2025-12-20T19:36:31Z

The Unit tests should be moved to wasm unit test and define a threshold
and clean up all the prints
otherwise, code LGTM

Yea will do it right away

ppraneth · 2025-12-20T19:53:48Z

Added a test case in the sgl-model-gateway\src\wasm\runtime.rs
@slin1237 can you check again and tell me if any changes are neede

…nent Caching (sgl-project#15515)

benchmark

158292e

ppraneth requested a review from slin1237 as a code owner December 20, 2025 06:12

github-actions Bot added the model-gateway label Dec 20, 2025

Merge branch 'main' into bug-rou

e0891b8

gemini-code-assist Bot reviewed Dec 20, 2025

View reviewed changes

Comment thread sgl-model-gateway/src/wasm/runtime.rs Outdated

Comment thread sgl-model-gateway/src/wasm/runtime.rs Outdated

ppraneth and others added 4 commits December 20, 2025 21:05

Merge branch 'main' into bug-rou

8f4702a

Update sgl-model-gateway/src/wasm/runtime.rs

66a9905

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Merge branch 'main' into bug-rou

e16d19d

benchmark and test

0f29c3c

ppraneth requested a review from CatherineSue as a code owner December 20, 2025 18:59

ppraneth added 4 commits December 21, 2025 00:37

benchmark and test

c006ca5

benchmark, minor fix and test

adfec4d

re check test

5a248cd

remove tset file

9573b03

slin1237 requested changes Dec 20, 2025

View reviewed changes

slin1237 added the run-ci label Dec 20, 2025

ppraneth added 2 commits December 21, 2025 01:11

add test case

e5fc1e4

Merge branch 'main' into bug-rou

be21940

ppraneth requested a review from slin1237 December 20, 2025 19:41

add test case

0606226

ppraneth added 2 commits December 21, 2025 01:32

minor fix

5ede2af

Merge branch 'main' into bug-rou

2a439bb

slin1237 approved these changes Dec 20, 2025

View reviewed changes

slin1237 merged commit 537ef18 into sgl-project:main Dec 20, 2025
60 checks passed

jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025

[model-gateway] Optimize WASM Runtime with Instance Pooling and Compo…

7510b78

…nent Caching (sgl-project#15515)

YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026

[model-gateway] Optimize WASM Runtime with Instance Pooling and Compo…

63b1edf

…nent Caching (sgl-project#15515)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model-gateway] Optimize WASM Runtime with Instance Pooling and Component Caching#15515

[model-gateway] Optimize WASM Runtime with Instance Pooling and Component Caching#15515
slin1237 merged 15 commits intosgl-project:mainfrom
ppraneth:bug-rou

ppraneth commented Dec 20, 2025 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Dec 20, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

slin1237 left a comment

Uh oh!

ppraneth commented Dec 20, 2025

Uh oh!

ppraneth commented Dec 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ppraneth commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented Dec 20, 2025

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

slin1237 left a comment

Choose a reason for hiding this comment

Uh oh!

ppraneth commented Dec 20, 2025

Uh oh!

ppraneth commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ppraneth commented Dec 20, 2025 •

edited

Loading

ppraneth commented Dec 20, 2025 •

edited

Loading