Refactor: Extract DeepSeek common utilities into shared module by DotSlash-A · Pull Request #16969 · sgl-project/sglang

DotSlash-A · 2026-01-12T16:51:56Z

Motivation

DeepseekV2 code implementation has grown quickly, making it difficult to maintain. This PR refactors these utilities into a dedicated, well-documented module with comprehensive test coverage.
Issue related: #16701

Modifications

Refactored utility functions into shared module:

Moved 4 utility functions from deepseek_v2.py to deepseek_common/utils.py:
- enable_nextn_moe_bf16_cast_to_fp8() - BF16 to FP8 casting logic for NextN MoE layers
- add_forward_absorb_core_attention_backend() - Attention backend registration
- yarn_get_mscale() - YaRN (Yet another RoPE extensioN) scaling calculation
- _get_llama_4_scaling() - Llama 4 style position-dependent RoPE scaling
Moved 2 constants and 1 cached variable to shared module
Added comprehensive Google-style docstrings to all functions

Code cleanup:

Reduced deepseek_v2.py by 109 lines (removed duplicated code)
Removed unused imports and cleaned up import ordering
Simplified attention forward logic (removed problematic CUDA stream overlapping code)
Updated deepseek_nextn.py to use shared utilities
Fixed code formatting and added missing newlines

Added comprehensive test coverage (489 lines across 5 files):

test/registered/utils/test_deepseek_utils.py (124 lines) - Automated tests registered for CPU CI
test/srt/test_deepseek_utils.py (121 lines) - Standard unit test suite
test/manual/test_deepseek_model_loading.py (57 lines) - Integration tests
test/manual/test_deepseek_smoke.py (78 lines) - Smoke tests
test/manual/test_deepseek_utils_refactoring.py (162 lines) - Manual test suite

Summary: 8 files changed, +664 insertions, -96 deletions

Accuracy Tests

No functional changes - All extracted functions maintain identical logic to original implementations. Only organizational changes and documentation added.

Comprehensive test coverage includes:

Constants validation (NVFP4_CKPT_FP8_ATTN_QUANT_MODULES, FORWARD_ABSORB_CORE_ATTENTION_BACKENDS, _is_cublas_ge_129)
yarn_get_mscale() behavior with various scale and mscale values
_get_llama_4_scaling() output shapes, monotonicity, and bounds checking
enable_nextn_moe_bf16_cast_to_fp8() edge cases
add_forward_absorb_core_attention_backend() registration and deduplication
Model loading and integration tests

Benchmarking and Profiling

N/A - This is a code organization refactor with no changes to computational logic or performance-critical code paths.

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

- Introduced new utility functions in `deepseek_common/utils.py` for model quantization and backend management. - Added integration and smoke tests for DeepSeek model loading and utility functions. - Refactored existing utility functions and ensured they are tested for correctness.

gemini-code-assist · 2026-01-12T16:52:21Z

Summary of Changes

Hello @DotSlash-A, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request undertakes a significant refactoring effort for DeepSeek model utilities, consolidating common functions into a dedicated shared module. This move aims to streamline the codebase, enhance maintainability, and ensure consistent behavior across different DeepSeek model variants. The changes also introduce new functionalities for managing quantization, attention mechanisms, and dynamic backend registration, all supported by a suite of new tests to validate the refactored components.

Highlights

Utility Function Centralization: DeepSeek-related utility functions have been moved from deepseek_v2.py and deepseek_nextn.py into a new, centralized deepseek_common/utils.py file to improve code organization and reduce redundancy.
New Utility Functions: Introduced several new utility functions including enable_nextn_moe_bf16_cast_to_fp8 for MoE layer casting, yarn_get_mscale and _get_llama_4_scaling for attention scaling, and add_forward_absorb_core_attention_backend for dynamic attention backend registration.
Code Simplification: The DeepseekV2 model implementation has been simplified by removing redundant local definitions and complex conditional logic, now relying on imports from the new common utilities file.
Enhanced Test Coverage: Comprehensive test coverage has been added for the refactored utility functions, including new manual integration tests for model loading and smoke tests, as well as dedicated unit tests for the utilities themselves, integrated into the CPU CI.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

- Introduced new utility functions in `deepseek_common/utils.py` for model quantization and backend management. - Added integration and smoke tests for DeepSeek model loading and utility functions. - Refactored existing utility functions and ensured they are tested for correctness.

gemini-code-assist

Code Review

This pull request refactors utility functions for DeepSeek models into a common utils.py file, which is a good move for code organization and reusability. The changes are mostly about moving code, but I've identified a few points for discussion:

A potential logic change in get_moe_weights due to the removal of a filter function.
The removal of a stream overlap optimization, which might impact performance.
Duplication of test code across multiple new test files.

Overall, the refactoring is well-structured, and the addition of new tests is commendable. Please see my detailed comments below.

I am having trouble creating individual review comments. Click here to see my feedback.

python/sglang/srt/models/deepseek_v2.py (593-595)

The call to filter_moe_weight_param_global_expert has been removed from the list comprehension in get_moe_weights. This function appeared to filter out weights for global experts. Its removal means that get_moe_weights might now include weights that were previously excluded. Could you clarify if this is an intended change in logic? If not, this could potentially affect which weights are processed, possibly leading to incorrect behavior or performance issues.

python/sglang/srt/models/deepseek_v2.py (1732-1753)

The logic for overlapping q_b_proj and indexer computations using an alternate stream during decoding has been removed. While this simplifies the code, it might lead to a performance regression. Was this optimization intentionally removed?

Fridge003 · 2026-01-17T09:00:32Z

Please fix conflicts

…ity functions after refactoring. Removed the add_forward_absorb_core_attention_backend function

DotSlash-A · 2026-01-17T10:47:31Z

I have fixed all the conflicts, pls check.
Thanks

Fridge003 · 2026-01-18T16:43:11Z

Please fix conflict

Fridge003 · 2026-01-20T08:08:23Z

@DotSlash-A Please fix lint

pre-commit install
pre-commit run --all-files

Fridge003 · 2026-01-20T09:48:15Z

/tag-and-rerun-ci

…roject#16969)

DotSlash-A requested review from Fridge003, ch-wan, fzyzcjy, ispobock, merrymercy and zhyncs as code owners January 12, 2026 16:51

github-actions Bot added the deepseek label Jan 12, 2026

DotSlash-A and others added 2 commits January 12, 2026 22:23

Refactor DeepSeek utility functions and add comprehensive unit tests

2e599b1

DotSlash-A force-pushed the utils-refactor branch from 67ba778 to 2e599b1 Compare January 12, 2026 16:53

gemini-code-assist Bot reviewed Jan 12, 2026

View reviewed changes

DotSlash-A mentioned this pull request Jan 12, 2026

[Feature] Extracting util functions in deepseek_v2.py #16701

Closed

2 tasks

DotSlash-A changed the title ~~Utils refactor~~ Refactor: Extract DeepSeek common utilities into shared module Jan 13, 2026

Fridge003 requested changes Jan 17, 2026

View reviewed changes

Comment thread test/manual/test_deepseek_model_loading.py Outdated

Comment thread python/sglang/srt/models/deepseek_common/utils.py Outdated

DotSlash-A added 3 commits January 17, 2026 15:50

Remove manual tests for DeepSeek model loading, smoke tests, and util…

9f14850

…ity functions after refactoring. Removed the add_forward_absorb_core_attention_backend function

Merge branch 'main' of github.com:DotSlash-A/sglang

4744292

Merge branch 'main' into utils-refactor

a96f737

DotSlash-A and others added 6 commits January 17, 2026 16:33

Delete test/manual/test_deepseek_utils_refactoring.py

42fcda4

Delete test/manual/test_deepseek_smoke.py

13536fe

Delete python/uv.lock

d2d4864

Refactor DeepSeek model imports and remove outdated manual tests

bdc0219

Add filter_moe_weight_param_global_expert utility to DeepseekV2MoE model

69975f2

Merge branch 'main' into utils-refactor

294d6ff

Fridge003 requested changes Jan 17, 2026

View reviewed changes

Comment thread python/sglang/srt/models/deepseek_v2.py

DotSlash-A requested a review from Ying1123 as a code owner January 19, 2026 17:18

github-actions Bot added quant LLM Quantization amd dependencies Pull requests that update a dependency file lora Multi-modal multi-modal language model hicache Hierarchical Caching for SGLang blackwell SM100/SM120 npu diffusion SGLang Diffusion labels Jan 19, 2026

DotSlash-A force-pushed the utils-refactor branch from 6c763c1 to 294d6ff Compare January 19, 2026 17:25

DotSlash-A and others added 2 commits January 19, 2026 17:30

reverted to previous version

45ce2b7

Merge branch 'main' into utils-refactor

01139e8

fixed lint

6c82aa4

DotSlash-A requested a review from Fridge003 January 20, 2026 09:43

github-actions Bot added the run-ci label Jan 20, 2026

Fridge003 reviewed Jan 20, 2026

View reviewed changes

Comment thread python/sglang/srt/models/deepseek_common/utils.py Outdated

removed duplication of Dequantization

7040731

DotSlash-A requested a review from Fridge003 January 20, 2026 10:19

Fridge003 reviewed Jan 20, 2026

View reviewed changes

Comment thread python/sglang/srt/models/deepseek_common/utils.py Outdated

removed doc strings

2a6dfbf

DotSlash-A requested a review from Fridge003 January 20, 2026 10:46

Fridge003 approved these changes Jan 20, 2026

View reviewed changes

DotSlash-A and others added 4 commits January 20, 2026 19:45

Merge branch 'main' into utils-refactor

5bfa3ce

Merge branch 'main' into utils-refactor

13ff0d0

Merge branch 'main' into utils-refactor

46b4e64

Merge branch 'main' into utils-refactor

d287a82

Fridge003 merged commit 894928a into sgl-project:main Jan 24, 2026
253 of 264 checks passed

Johnsonms pushed a commit to Johnsonms/sglang that referenced this pull request Feb 14, 2026

Refactor: Extract DeepSeek common utilities into shared module (sgl-p…

b7bdf5e

…roject#16969)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: Extract DeepSeek common utilities into shared module#16969

Refactor: Extract DeepSeek common utilities into shared module#16969
Fridge003 merged 21 commits intosgl-project:mainfrom
DotSlash-A:utils-refactor

DotSlash-A commented Jan 12, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Jan 12, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Fridge003 commented Jan 17, 2026

Uh oh!

DotSlash-A commented Jan 17, 2026

Uh oh!

Uh oh!

Fridge003 commented Jan 18, 2026

Uh oh!

Fridge003 commented Jan 20, 2026

Uh oh!

Fridge003 commented Jan 20, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DotSlash-A commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist Bot commented Jan 12, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

python/sglang/srt/models/deepseek_v2.py (593-595)

python/sglang/srt/models/deepseek_v2.py (1732-1753)

Uh oh!

Uh oh!

Uh oh!

Fridge003 commented Jan 17, 2026

Uh oh!

DotSlash-A commented Jan 17, 2026

Uh oh!

Uh oh!

Fridge003 commented Jan 18, 2026

Uh oh!

Fridge003 commented Jan 20, 2026

Uh oh!

Fridge003 commented Jan 20, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DotSlash-A commented Jan 12, 2026 •

edited

Loading