Enable memory saver for hybrid model by ocss884 · Pull Request #11974 · sgl-project/sglang

ocss884 · 2025-10-22T13:36:32Z

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

gemini-code-assist · 2025-10-22T13:36:50Z

Summary of Changes

Hello @ocss884, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a memory saving mechanism for the hybrid model, specifically targeting the Mamba state and KV cache allocations. By introducing an enable_memory_saver flag and integrating a TorchMemorySaverAdapter, the changes allow for more efficient GPU memory utilization. This optimization aims to reduce the overall memory footprint of the model during inference, which can be critical for deploying larger models or handling increased concurrency.

Highlights

Memory Saver Integration: Introduced an enable_memory_saver boolean parameter across various memory pool constructors and methods to control memory optimization.
Mamba Memory Pool Optimization: The MambaMemoryPool now utilizes a TorchMemorySaverAdapter to wrap the allocation of conv_state and temporal_state, allowing for conditional memory saving.
Hybrid Memory Pool Configuration: The HybridMemoryPool constructor and its internal _init_mamba_pool method have been updated to accept and propagate the enable_memory_saver flag to underlying memory components, including the KVCache.
Model Runner Update: The init_memory_pool function within the model_runner now passes the server_args.enable_memory_saver setting to the HybridMemoryPool during initialization, enabling server-wide control over this feature.
Unit Test Enhancement: A unit test (test_hybrid_linear_kv_pool) has been modified to explicitly enable enable_memory_saver when initializing the HybridMemoryPool, ensuring test coverage for the new functionality.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

ocss884 · 2025-10-22T13:37:44Z

@yizhang2077 @fzyzcjy

gemini-code-assist

Code Review

This pull request enables the memory saver feature for hybrid models. The changes primarily involve propagating the enable_memory_saver flag through various components, including MambaPool, HybridReqToTokenPool, and HybridLinearKVPool. The flag is then used to wrap memory-intensive buffer allocations with the TorchMemorySaverAdapter, which is the intended behavior. The implementation appears correct and consistent. I've found one minor issue regarding some leftover commented-out code that should be cleaned up.

gemini-code-assist · 2025-10-22T13:38:19Z

+    # def _create_buffers(self):
+    #     with self.memory_saver_adapter.region(GPU_MEMORY_TYPE_KV_CACHE):
+


This commented-out code appears to be a leftover from a refactoring. To improve code clarity and maintainability, it should be removed.

fzyzcjy

LGTM if it is just adding with blocks and test passes

yizhang2077

LGTM overall

fzyzcjy

LGTM reading the new diff

Fridge003 · 2025-11-04T22:11:51Z

@ocss884 It's breaking H200 test, please fix it. https://github.com/sgl-project/sglang/actions/runs/19082581304/job/54518240502

This reverts commit 173e0f7.

ocss884 added 3 commits October 20, 2025 06:31

enbale memory saver for hybrid linear kv pool

22441f9

mem saver for mamba cache

6018ea7

Merge branch 'main' into enable_hybrid_mem_saver

f85b4f6

ocss884 requested review from Ying1123, hnyls2002, ispobock, merrymercy, xiezhq-hermann and zhyncs as code owners October 22, 2025 13:36

gemini-code-assist Bot reviewed Oct 22, 2025

View reviewed changes

yizhang2077 reviewed Oct 22, 2025

View reviewed changes

Comment thread python/sglang/srt/mem_cache/memory_pool.py Outdated

fzyzcjy reviewed Oct 22, 2025

View reviewed changes

Comment thread python/sglang/srt/mem_cache/memory_pool.py Outdated

yizhang2077 approved these changes Oct 22, 2025

View reviewed changes

ocss884 and others added 3 commits October 22, 2025 14:04

rm redundant code

04f622e

nit

53bd05d

Merge branch 'main' into enable_hybrid_mem_saver

94f5323

fzyzcjy added the run-ci label Oct 23, 2025

fzyzcjy approved these changes Oct 23, 2025

View reviewed changes

fzyzcjy and others added 6 commits October 23, 2025 08:09

Merge branch 'main' into enable_hybrid_mem_saver

b7b8931

disable mem saver in mamba ut

6a83e10

Merge branch 'main' into enable_hybrid_mem_saver

7d1ec6f

Merge branch 'sgl-project:main' into enable_hybrid_mem_saver

1ae2f40

Merge branch 'main' into enable_hybrid_mem_saver

e1dcb57

Merge branch 'main' into enable_hybrid_mem_saver

23adf66

ispobock merged commit 173e0f7 into sgl-project:main Nov 4, 2025
17 of 48 checks passed

Fridge003 added a commit that referenced this pull request Nov 4, 2025

Revert "Enable memory saver for hybrid model (#11974)"

d34f6ca

This reverts commit 173e0f7.

Fridge003 mentioned this pull request Nov 4, 2025

Revert "Enable memory saver for hybrid model" #12648

Merged

ocss884 mentioned this pull request Nov 18, 2025

[Feature] Re:Enable hybrid mem saver #12962

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable memory saver for hybrid model#11974

Enable memory saver for hybrid model#11974
ispobock merged 12 commits intosgl-project:mainfrom
ocss884:enable_hybrid_mem_saver

ocss884 commented Oct 22, 2025 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Oct 22, 2025

Uh oh!

ocss884 commented Oct 22, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Oct 22, 2025

Uh oh!

Uh oh!

fzyzcjy left a comment

Uh oh!

Uh oh!

yizhang2077 left a comment

Uh oh!

fzyzcjy left a comment

Uh oh!

Uh oh!

Fridge003 commented Nov 4, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		# def _create_buffers(self):
		# with self.memory_saver_adapter.region(GPU_MEMORY_TYPE_KV_CACHE):

Conversation

ocss884 commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented Oct 22, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

ocss884 commented Oct 22, 2025

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fzyzcjy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yizhang2077 left a comment

Choose a reason for hiding this comment

Uh oh!

fzyzcjy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Fridge003 commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ocss884 commented Oct 22, 2025 •

edited

Loading

Fridge003 commented Nov 4, 2025 •

edited

Loading