Skip to content

Refactor: Extract DeepSeek common utilities into shared module#16969

Merged
Fridge003 merged 21 commits intosgl-project:mainfrom
DotSlash-A:utils-refactor
Jan 24, 2026
Merged

Refactor: Extract DeepSeek common utilities into shared module#16969
Fridge003 merged 21 commits intosgl-project:mainfrom
DotSlash-A:utils-refactor

Conversation

@DotSlash-A
Copy link
Copy Markdown
Contributor

@DotSlash-A DotSlash-A commented Jan 12, 2026

Motivation

DeepseekV2 code implementation has grown quickly, making it difficult to maintain. This PR refactors these utilities into a dedicated, well-documented module with comprehensive test coverage.
Issue related: #16701

Modifications

Refactored utility functions into shared module:

  • Moved 4 utility functions from deepseek_v2.py to deepseek_common/utils.py:
    • enable_nextn_moe_bf16_cast_to_fp8() - BF16 to FP8 casting logic for NextN MoE layers
    • add_forward_absorb_core_attention_backend() - Attention backend registration
    • yarn_get_mscale() - YaRN (Yet another RoPE extensioN) scaling calculation
    • _get_llama_4_scaling() - Llama 4 style position-dependent RoPE scaling
  • Moved 2 constants and 1 cached variable to shared module
  • Added comprehensive Google-style docstrings to all functions

Code cleanup:

  • Reduced deepseek_v2.py by 109 lines (removed duplicated code)
  • Removed unused imports and cleaned up import ordering
  • Simplified attention forward logic (removed problematic CUDA stream overlapping code)
  • Updated deepseek_nextn.py to use shared utilities
  • Fixed code formatting and added missing newlines

Added comprehensive test coverage (489 lines across 5 files):

  • test/registered/utils/test_deepseek_utils.py (124 lines) - Automated tests registered for CPU CI
  • test/srt/test_deepseek_utils.py (121 lines) - Standard unit test suite
  • test/manual/test_deepseek_model_loading.py (57 lines) - Integration tests
  • test/manual/test_deepseek_smoke.py (78 lines) - Smoke tests
  • test/manual/test_deepseek_utils_refactoring.py (162 lines) - Manual test suite

Summary: 8 files changed, +664 insertions, -96 deletions

Accuracy Tests

No functional changes - All extracted functions maintain identical logic to original implementations. Only organizational changes and documentation added.

Comprehensive test coverage includes:

  • Constants validation (NVFP4_CKPT_FP8_ATTN_QUANT_MODULES, FORWARD_ABSORB_CORE_ATTENTION_BACKENDS, _is_cublas_ge_129)
  • yarn_get_mscale() behavior with various scale and mscale values
  • _get_llama_4_scaling() output shapes, monotonicity, and bounds checking
  • enable_nextn_moe_bf16_cast_to_fp8() edge cases
  • add_forward_absorb_core_attention_backend() registration and deduplication
  • Model loading and integration tests

Benchmarking and Profiling

N/A - This is a code organization refactor with no changes to computational logic or performance-critical code paths.

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

- Introduced new utility functions in `deepseek_common/utils.py` for model quantization and backend management.
- Added integration and smoke tests for DeepSeek model loading and utility functions.
- Refactored existing utility functions and ensured they are tested for correctness.
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @DotSlash-A, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request undertakes a significant refactoring effort for DeepSeek model utilities, consolidating common functions into a dedicated shared module. This move aims to streamline the codebase, enhance maintainability, and ensure consistent behavior across different DeepSeek model variants. The changes also introduce new functionalities for managing quantization, attention mechanisms, and dynamic backend registration, all supported by a suite of new tests to validate the refactored components.

Highlights

  • Utility Function Centralization: DeepSeek-related utility functions have been moved from deepseek_v2.py and deepseek_nextn.py into a new, centralized deepseek_common/utils.py file to improve code organization and reduce redundancy.
  • New Utility Functions: Introduced several new utility functions including enable_nextn_moe_bf16_cast_to_fp8 for MoE layer casting, yarn_get_mscale and _get_llama_4_scaling for attention scaling, and add_forward_absorb_core_attention_backend for dynamic attention backend registration.
  • Code Simplification: The DeepseekV2 model implementation has been simplified by removing redundant local definitions and complex conditional logic, now relying on imports from the new common utilities file.
  • Enhanced Test Coverage: Comprehensive test coverage has been added for the refactored utility functions, including new manual integration tests for model loading and smoke tests, as well as dedicated unit tests for the utilities themselves, integrated into the CPU CI.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

DotSlash-A and others added 2 commits January 12, 2026 22:23
- Introduced new utility functions in `deepseek_common/utils.py` for model quantization and backend management.
- Added integration and smoke tests for DeepSeek model loading and utility functions.
- Refactored existing utility functions and ensured they are tested for correctness.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors utility functions for DeepSeek models into a common utils.py file, which is a good move for code organization and reusability. The changes are mostly about moving code, but I've identified a few points for discussion:

  1. A potential logic change in get_moe_weights due to the removal of a filter function.
  2. The removal of a stream overlap optimization, which might impact performance.
  3. Duplication of test code across multiple new test files.

Overall, the refactoring is well-structured, and the addition of new tests is commendable. Please see my detailed comments below.

I am having trouble creating individual review comments. Click here to see my feedback.

python/sglang/srt/models/deepseek_v2.py (593-595)

high

The call to filter_moe_weight_param_global_expert has been removed from the list comprehension in get_moe_weights. This function appeared to filter out weights for global experts. Its removal means that get_moe_weights might now include weights that were previously excluded. Could you clarify if this is an intended change in logic? If not, this could potentially affect which weights are processed, possibly leading to incorrect behavior or performance issues.

python/sglang/srt/models/deepseek_v2.py (1732-1753)

medium

The logic for overlapping q_b_proj and indexer computations using an alternate stream during decoding has been removed. While this simplifies the code, it might lead to a performance regression. Was this optimization intentionally removed?

@DotSlash-A DotSlash-A changed the title Utils refactor Refactor: Extract DeepSeek common utilities into shared module Jan 13, 2026
Comment thread test/manual/test_deepseek_model_loading.py Outdated
Comment thread python/sglang/srt/models/deepseek_common/utils.py Outdated
@Fridge003
Copy link
Copy Markdown
Collaborator

Please fix conflicts

@DotSlash-A
Copy link
Copy Markdown
Contributor Author

I have fixed all the conflicts, pls check.
Thanks

Comment thread python/sglang/srt/models/deepseek_v2.py
@Fridge003
Copy link
Copy Markdown
Collaborator

Please fix conflict

@DotSlash-A DotSlash-A requested a review from Ying1123 as a code owner January 19, 2026 17:18
@github-actions github-actions Bot added quant LLM Quantization amd dependencies Pull requests that update a dependency file lora Multi-modal multi-modal language model hicache Hierarchical Caching for SGLang blackwell SM100/SM120 npu diffusion SGLang Diffusion labels Jan 19, 2026
@Fridge003
Copy link
Copy Markdown
Collaborator

@DotSlash-A Please fix lint

pre-commit install
pre-commit run --all-files

@DotSlash-A DotSlash-A requested a review from Fridge003 January 20, 2026 09:43
@Fridge003
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

Comment thread python/sglang/srt/models/deepseek_common/utils.py Outdated
@DotSlash-A DotSlash-A requested a review from Fridge003 January 20, 2026 10:19
Comment thread python/sglang/srt/models/deepseek_common/utils.py Outdated
@DotSlash-A DotSlash-A requested a review from Fridge003 January 20, 2026 10:46
@Fridge003 Fridge003 merged commit 894928a into sgl-project:main Jan 24, 2026
253 of 264 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

amd blackwell SM100/SM120 deepseek dependencies Pull requests that update a dependency file diffusion SGLang Diffusion documentation Improvements or additions to documentation hicache Hierarchical Caching for SGLang lora Multi-modal multi-modal language model npu quant LLM Quantization run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants