Refactor: Extract DeepSeek common utilities into shared module#16969
Refactor: Extract DeepSeek common utilities into shared module#16969Fridge003 merged 21 commits intosgl-project:mainfrom
Conversation
- Introduced new utility functions in `deepseek_common/utils.py` for model quantization and backend management. - Added integration and smoke tests for DeepSeek model loading and utility functions. - Refactored existing utility functions and ensured they are tested for correctness.
Summary of ChangesHello @DotSlash-A, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request undertakes a significant refactoring effort for DeepSeek model utilities, consolidating common functions into a dedicated shared module. This move aims to streamline the codebase, enhance maintainability, and ensure consistent behavior across different DeepSeek model variants. The changes also introduce new functionalities for managing quantization, attention mechanisms, and dynamic backend registration, all supported by a suite of new tests to validate the refactored components. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
- Introduced new utility functions in `deepseek_common/utils.py` for model quantization and backend management. - Added integration and smoke tests for DeepSeek model loading and utility functions. - Refactored existing utility functions and ensured they are tested for correctness.
67ba778 to
2e599b1
Compare
There was a problem hiding this comment.
Code Review
This pull request refactors utility functions for DeepSeek models into a common utils.py file, which is a good move for code organization and reusability. The changes are mostly about moving code, but I've identified a few points for discussion:
- A potential logic change in
get_moe_weightsdue to the removal of a filter function. - The removal of a stream overlap optimization, which might impact performance.
- Duplication of test code across multiple new test files.
Overall, the refactoring is well-structured, and the addition of new tests is commendable. Please see my detailed comments below.
I am having trouble creating individual review comments. Click here to see my feedback.
python/sglang/srt/models/deepseek_v2.py (593-595)
The call to filter_moe_weight_param_global_expert has been removed from the list comprehension in get_moe_weights. This function appeared to filter out weights for global experts. Its removal means that get_moe_weights might now include weights that were previously excluded. Could you clarify if this is an intended change in logic? If not, this could potentially affect which weights are processed, possibly leading to incorrect behavior or performance issues.
python/sglang/srt/models/deepseek_v2.py (1732-1753)
The logic for overlapping q_b_proj and indexer computations using an alternate stream during decoding has been removed. While this simplifies the code, it might lead to a performance regression. Was this optimization intentionally removed?
|
Please fix conflicts |
…ity functions after refactoring. Removed the add_forward_absorb_core_attention_backend function
|
I have fixed all the conflicts, pls check. |
|
Please fix conflict |
6c763c1 to
294d6ff
Compare
|
@DotSlash-A Please fix lint |
|
/tag-and-rerun-ci |
Motivation
DeepseekV2 code implementation has grown quickly, making it difficult to maintain. This PR refactors these utilities into a dedicated, well-documented module with comprehensive test coverage.
Issue related: #16701
Modifications
Refactored utility functions into shared module:
deepseek_v2.pytodeepseek_common/utils.py:enable_nextn_moe_bf16_cast_to_fp8()- BF16 to FP8 casting logic for NextN MoE layersadd_forward_absorb_core_attention_backend()- Attention backend registrationyarn_get_mscale()- YaRN (Yet another RoPE extensioN) scaling calculation_get_llama_4_scaling()- Llama 4 style position-dependent RoPE scalingCode cleanup:
deepseek_v2.pyby 109 lines (removed duplicated code)deepseek_nextn.pyto use shared utilitiesAdded comprehensive test coverage (489 lines across 5 files):
test/registered/utils/test_deepseek_utils.py(124 lines) - Automated tests registered for CPU CItest/srt/test_deepseek_utils.py(121 lines) - Standard unit test suitetest/manual/test_deepseek_model_loading.py(57 lines) - Integration teststest/manual/test_deepseek_smoke.py(78 lines) - Smoke teststest/manual/test_deepseek_utils_refactoring.py(162 lines) - Manual test suiteSummary: 8 files changed, +664 insertions, -96 deletions
Accuracy Tests
No functional changes - All extracted functions maintain identical logic to original implementations. Only organizational changes and documentation added.
Comprehensive test coverage includes:
NVFP4_CKPT_FP8_ATTN_QUANT_MODULES,FORWARD_ABSORB_CORE_ATTENTION_BACKENDS,_is_cublas_ge_129)yarn_get_mscale()behavior with various scale and mscale values_get_llama_4_scaling()output shapes, monotonicity, and bounds checkingenable_nextn_moe_bf16_cast_to_fp8()edge casesadd_forward_absorb_core_attention_backend()registration and deduplicationBenchmarking and Profiling
N/A - This is a code organization refactor with no changes to computational logic or performance-critical code paths.
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci