[6/N] MoE Refactor: Cleanup MoE-related configs by ch-wan · Pull Request #8849 · sgl-project/sglang

ch-wan · 2025-08-06T06:09:40Z

Motivation

Adding --moe-runner-backend and deprecating --enable-triton-kernel-moe, --enable-flashinfer-cutlass-moe, and --enable-flashinfer-trtllm-moe.
Adding TopKOutputChecker and DispatchOutputChecker to make pylint happy.
Adding some util functions to avoid calling global_server_args in moe-related logics.
Adding MoeRunnerConfig to wrap up moe runner configs.
Some minor cleanup

Modifications

Accuracy Test

Benchmark & Profiling

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

…args_dict

gemini-code-assist · 2025-08-06T06:09:44Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…factor/ep-framework

…ramework

gemini-code-assist

Code Review

This pull request is a large-scale refactoring of MoE-related configurations. It introduces MoeRunnerConfig and TopKConfig to encapsulate parameters, adds a new --moe-runner-backend argument to unify several flags, and centralizes configuration logic to avoid direct use of global state. These changes significantly improve code clarity, maintainability, and organization. The implementation appears correct and consistent with the stated goals. I've found one potential issue with in-place modification of a configuration object, which could lead to unexpected behavior.

gemini-code-assist · 2025-08-11T10:12:27Z

Summary of Changes

This pull request significantly refactors the Mixture-of-Experts (MoE) related configurations and their usage throughout the codebase. My primary goal was to centralize MoE runner settings into a new MoeRunnerConfig dataclass and introduce a unified --moe-runner-backend argument, deprecating several older, specific flags. This change streamlines how MoE backends are selected and configured. Additionally, I've cleaned up direct accesses to global server arguments by introducing dedicated utility functions, improving modularity and maintainability. The changes also include the introduction of TopKConfig for clearer expert selection parameters and new checker classes for output format validation.

Highlights

Consolidated MoE Backend Selection: I've introduced a new --moe-runner-backend command-line argument that centralizes the selection of MoE backend implementations. This replaces the older, more specific flags like --enable-triton-kernel-moe, --enable-flashinfer-cutlass-moe, and --enable-flashinfer-trtllm-moe, simplifying configuration.
Centralized MoE Runner Configuration: I've created a MoeRunnerConfig dataclass to encapsulate all MoE runner-related configurations, such as activation type, in-place operations, and routed scaling factors. This improves code organization and makes it easier to pass MoE settings consistently across different functions and layers.
Improved MoE Configuration Access: I've refactored MoE-related logic to reduce direct dependencies on the global_server_args_dict. Instead, new utility functions like get_moe_a2a_backend, get_moe_runner_backend, and get_deepep_mode are now used, promoting cleaner code and better separation of concerns.
Structured Top-K Configuration: I've introduced a TopKConfig dataclass to specifically manage parameters related to top-k routing in MoE layers. This makes the configuration of expert selection more explicit and maintainable.
Pylint Compliance and Type Checking: To address pylint warnings and improve code quality, I've added TopKOutputChecker and DispatchOutputChecker classes. These provide static methods for type-safe validation of output formats, enhancing code robustness.

Changelog

benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py
- Imported override_config, MoeRunnerConfig, and TopKConfig.
- Updated calls to select_experts to utilize the new TopKConfig object.
- Modified the fused_moe function call to pass a moe_runner_config object instead of an inplace boolean.
docs/advanced_features/server_arguments.md
- Replaced deprecated command-line arguments (--enable-flashinfer-cutlass-moe, --enable-flashinfer-trtllm-moe, --enable-triton-kernel-moe) with a single, unified --moe-runner-backend argument.
- Updated the description for the --ep-dispatch-algorithm argument.
python/sglang/bench_one_batch.py
- Removed imports for DeepEPMode and MoeA2ABackend.
- Eliminated parameters related to enable_two_batch_overlap, enable_deepep_moe, and deepep_mode from ForwardBatch initialization.
python/sglang/srt/eplb/expert_distribution.py
- Removed the import of global_server_args_dict.
python/sglang/srt/layers/communicator.py
- Changed import torch.distributed to import torch.
- Imported get_moe_a2a_backend.
- Updated the _compute_mlp_mode function to use get_moe_a2a_backend().is_standard() for checking scatter mode.
python/sglang/srt/layers/moe/init.py
- Added a new __init__.py file to the moe directory, exposing new MoE-related classes and utility functions for centralized access.
python/sglang/srt/layers/moe/ep_moe/layer.py
- Imported new MoE utility functions: get_deepep_mode, get_moe_a2a_backend, get_moe_runner_backend, and should_use_flashinfer_trtllm_moe.
- Removed direct imports of DeepEPMode, Fp8MoEMethod, and get_tile_tokens_dim.
- Refactored __init__ parameters, replacing activation_alpha with alpha, swiglu_limit with limit, and removing tp_size and deepep_mode.
- Updated internal logic to consistently use self.moe_runner_config.activation and self.moe_runner_config.routed_scaling_factor.
- Modified deepep_mode assignment to retrieve its value via get_deepep_mode().
- Updated moe_impl to leverage DispatchOutputChecker for robust format validation.
- Adjusted get_moe_impl_class to dynamically determine the MoE implementation based on get_moe_a2a_backend() and get_moe_runner_backend().
python/sglang/srt/layers/moe/fused_moe_native.py
- Imported MoeRunnerConfig and StandardTopKOutput.
- Updated fused_moe_forward_native and moe_forward_native function signatures to accept a moe_runner_config object, centralizing MoE runner parameters.
python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py
- Imported MoeRunnerConfig and StandardTopKOutput.
- Updated function signatures to consistently use moe_runner_config and StandardTopKOutput.
- Replaced activation_alpha with alpha and swiglu_limit with limit for activation parameters.
- Ensured all relevant logic now accesses MoE runner properties (e.g., inplace, no_combine, activation, apply_router_weight_on_input, routed_scaling_factor, alpha, limit) directly from the moe_runner_config object.
python/sglang/srt/layers/moe/fused_moe_triton/layer.py
- Removed unused imports such as datetime, glob, os, and sys.
- Imported MoeRunnerConfig, get_moe_runner_backend, and TopKOutputChecker.
- Initialized self.moe_runner_config directly from constructor arguments, centralizing MoE runner configuration.
- Eliminated redundant direct attributes like activation_alpha, swiglu_limit, routed_scaling_factor, activation, apply_router_weight_on_input, inplace, and no_combine, as these are now encapsulated within moe_runner_config.
- Updated checks for use_triton_kernels, use_flashinfer_mxfp4_moe, and enable_flashinfer_cutlass_moe to utilize get_moe_runner_backend().
- Refactored the forward method to use TopKOutputChecker and moe_runner_config for improved clarity and consistency.
- Streamlined the __init__ and forward methods of FlashInferFusedMoE and FlashInferFP4MoE by leveraging the new moe_runner_config and TopKOutputChecker.
python/sglang/srt/layers/moe/fused_moe_triton/triton_kernels_moe.py
- Imported MoeRunnerConfig.
- Updated triton_kernel_moe_forward and triton_kernel_moe_with_bias_forward to accept a moe_runner_config object.
- Applied TopKOutputChecker for asserting the format of topk_output.
- Ensured activation, alpha, and limit properties are now accessed via moe_runner_config.
python/sglang/srt/layers/moe/moe_runner.py
- Added a new file defining the MoeRunnerConfig dataclass, which centralizes MoE runner parameters for better organization.
python/sglang/srt/layers/moe/token_dispatcher/init.py
- Imported DispatchOutputChecker and StandardDispatchOutput.
- Added AscendDeepEPLLOutput to the __all__ export list.
python/sglang/srt/layers/moe/token_dispatcher/base_dispatcher.py
- Removed the MoEA2ABackend enum, as it has been relocated to utils.py.
- Introduced the DispatchOutputChecker dataclass, providing static methods for type-safe validation of DispatchOutput formats.
- Converted DispatchOutputFormat enum values to uppercase for consistency.
python/sglang/srt/layers/moe/token_dispatcher/deepep.py
- Imported DeepEPMode, get_deepep_config, and is_tbo_enabled from sglang.srt.layers.moe.
- Updated the format property of DeepEPNormalOutput, DeepEPLLOutput, and AscendDeepEPLLOutput to use uppercase enum values.
- Modified get_deepep_buffer to use default values for num_max_dispatch_tokens_per_rank and num_experts, and to check is_tbo_enabled().
- Updated DeepEPConfig to retrieve its configuration using get_deepep_config().
python/sglang/srt/layers/moe/token_dispatcher/standard.py
- Updated the format property of StandardDispatchOutput to use an uppercase enum value.
python/sglang/srt/layers/moe/topk.py
- Introduced the TopKConfig dataclass to encapsulate top-k routing parameters.
- Added the TopKOutputChecker dataclass with static methods for type-safe validation of TopKOutput formats.
- Defined BypassedTopKOutput as a new NamedTuple for specific top-k output scenarios.
- Extended the TopKOutputFormat enum to include BYPASSED.
- Refactored the TopK class to internally use the new TopKConfig.
- Updated the select_experts function to accept a TopKConfig object.
- Modified forward_cuda and forward_npu to leverage TopKConfig and TopKOutputChecker for parameter handling and validation.
python/sglang/srt/layers/moe/utils.py
- Removed the should_use_flashinfer_trtllm_moe function, as its logic is now integrated with get_moe_runner_backend.
- Redefined the MoeA2ABackend enum for clarity.
- Introduced a new MoeRunnerBackend enum to categorize different MoE runner types.
- Implemented global variables and an initialize_moe_config function to centralize MoE configuration management.
- Added getter functions (get_moe_a2a_backend, get_moe_runner_backend, get_deepep_mode, get_deepep_config, is_tbo_enabled, get_tbo_token_distribution_threshold) for accessing MoE configurations.
- Updated the should_use_flashinfer_trtllm_moe function to rely on get_moe_runner_backend().
python/sglang/srt/layers/quantization/awq.py
- Imported MoeRunnerConfig.
- Updated the apply method to accept moe_runner_config and use its activation property.
python/sglang/srt/layers/quantization/base_config.py
- Imported MoeRunnerConfig.
- Updated the apply method signature to accept moe_runner_config.
python/sglang/srt/layers/quantization/blockwise_int8.py
- Imported MoeRunnerConfig.
- Updated the apply method signature to accept moe_runner_config and pass it to fused_experts.
python/sglang/srt/layers/quantization/compressed_tensors/compressed_tensors_moe.py
- Imported MoeRunnerConfig.
- Updated the apply method signature to accept moe_runner_config and pass it to fused_experts.
- Ensured moe_runner_config.activation is used for activation assertions.
python/sglang/srt/layers/quantization/fp4.py
- Imported MoeRunnerConfig.
- Refactored MxFp4MoEMethod to directly inherit from FusedMoEMethodBase.
- Updated the apply method signature to accept moe_runner_config and use its activation property.
python/sglang/srt/layers/quantization/fp8.py
- Imported MoeRunnerConfig.
- Updated the apply method signature to accept moe_runner_config and utilize its properties.
- Modified apply_with_router_logits to use moe_runner_config and TopKOutputChecker for parameter handling.
python/sglang/srt/layers/quantization/fp8_utils.py
- Added a comment to clarify that MoE backends should be defined via --moe-runner-backend.
python/sglang/srt/layers/quantization/gptq.py
- Imported MoeRunnerConfig.
- Updated the apply method signature to accept moe_runner_config and use its activation property.
python/sglang/srt/layers/quantization/marlin_utils.py
- Imported FusedMoE.
- Updated check_moe_marlin_supports_layer to accept FusedMoE and access MoE runner properties via moe_runner_config.
python/sglang/srt/layers/quantization/modelopt_quant.py
- Imported should_use_flashinfer_trtllm_moe from sglang.srt.layers.moe.
- Imported FusedMoE and MoeRunnerConfig.
- Updated the apply method signature to accept moe_runner_config and use its properties.
- Modified the enable_flashinfer_cutlass_moe property to retrieve its value from get_moe_runner_backend().
python/sglang/srt/layers/quantization/moe_wna16.py
- Imported MoeRunnerConfig.
- Updated the apply method signature to accept moe_runner_config and use its properties.
- Modified moe_wna16_weight_loader to correctly use layer.moe_tp_size.
python/sglang/srt/layers/quantization/mxfp4.py
- Imported get_moe_runner_backend and MoeRunnerConfig.
- Updated the __init__ method to use get_moe_runner_backend() for initializing use_triton_kernels and use_flashinfer.
- Modified the apply method signature to accept moe_runner_config and utilize its properties.
python/sglang/srt/layers/quantization/unquant.py
- Imported MoeRunnerConfig.
- Updated the method signatures of apply, forward_cuda, forward_cpu, and forward_npu to accept moe_runner_config and use its properties.
python/sglang/srt/layers/quantization/w4afp8.py
- Updated the TopKOutput import to StandardTopKOutput.
- Modified the apply method signature to accept StandardTopKOutput.
python/sglang/srt/layers/quantization/w8a8_fp8.py
- Imported MoeRunnerConfig.
- Updated the TopKOutput import to StandardTopKOutput.
- Modified the apply method signature to accept StandardTopKOutput and moe_runner_config.
python/sglang/srt/layers/quantization/w8a8_int8.py
- Imported MoeRunnerConfig.
- Updated the apply method signature to accept moe_runner_config and use its properties.
python/sglang/srt/managers/schedule_batch.py
- Imported is_tbo_enabled.
- Removed several global server arguments from global_server_args_dict that are now managed by the new MoE configuration system.
- Updated get_model_worker_batch to use is_tbo_enabled() for two-batch overlap checks.
python/sglang/srt/managers/scheduler.py
- Imported new MoE utility functions: get_deepep_mode, get_moe_a2a_backend, initialize_moe_config, and is_tbo_enabled.
- Added an init_moe_config method to initialize global MoE configurations.
- Removed enable_two_batch_overlap, enable_deepep_moe, and deepep_mode parameters from prepare_mlp_sync_batch and prepare_mlp_sync_batch_raw.
- Updated prepare_mlp_sync_batch_raw to retrieve MoE configurations using the new getter functions.
python/sglang/srt/model_executor/model_runner.py
- Removed imports of DeepEPMode and MoeA2ABackend.
- Eliminated moe_a2a_backend and deepep_mode from self.model_config.extra_args.
python/sglang/srt/models/dbrx.py
- Imported MoeRunnerConfig and TopK.
- Initialized self.topk and self.moe_runner_config.
- Updated the fused_moe call to use topk_output and moe_runner_config.
- Changed the return type of the forward method to Tuple[torch.Tensor, torch.Tensor].
python/sglang/srt/models/deepseek.py
- Imported MoeRunnerConfig.
- Updated the fused_moe call to use moe_runner_config.
python/sglang/srt/models/deepseek_v2.py
- Removed imports of get_local_attention_dp_size and should_use_flashinfer_trtllm_moe.
- Imported get_deepep_mode, get_moe_a2a_backend, and FusedMoE.
- Eliminated deepep_mode, enable_flashinfer_cutlass_moe, renormalize, use_grouped_topk, num_expert_group, topk_group, and correction_bias from experts initialization.
- Updated deepep_mode and _enable_deepep_moe assignments to use the new getter functions.
- Simplified forward_normal_dual_stream and forward_normal by removing conditional topk_output assignments.
- Modified make_expert_params_mapping to use FusedMoE.
python/sglang/srt/models/ernie4.py
- Imported FusedMoE.
- Updated make_expert_params_mapping to use FusedMoE.
python/sglang/srt/models/glm4_moe.py
- Imported get_deepep_mode, get_moe_a2a_backend, and FusedMoE.
- Removed the should_use_flashinfer_trtllm_moe import.
- Removed the model_forward_maybe_tbo import.
- Simplified the initialization of self.topk.
- Eliminated deepep_mode, enable_flashinfer_cutlass_moe, renormalize, use_grouped_topk, num_expert_group, num_fused_shared_experts, topk_group, and correction_bias from experts initialization.
- Updated _enable_deepep_moe assignment to use the getter function.
- Simplified forward_normal_dual_stream and forward_normal by removing conditional topk_output assignments.
- Modified make_expert_params_mapping to use FusedMoE.
python/sglang/srt/models/glm4v_moe.py
- Removed imports related to parallel_state, tensor_model_parallel_all_reduce, get_attention_tp_rank, get_attention_tp_size, and get_local_attention_dp_size.
- Imported FusedMoE.
- Removed the initialization of self.dp_size.
- Updated make_expert_params_mapping to use FusedMoE.
python/sglang/srt/models/gpt_oss.py
- Removed imports of get_local_attention_dp_size and DeepEPMode.
- Imported get_moe_a2a_backend and FusedMoE.
- Simplified the initialization of self.topk.
- Removed enable_flashinfer_cutlass_moe from extra_kwargs.
- Updated activation_alpha and swiglu_limit to alpha and limit respectively in experts initialization.
- Removed deepep_mode from experts initialization.
- Updated the forward method to use get_moe_a2a_backend().
- Removed the initialization of self.local_dp_size.
- Modified make_expert_params_mapping_fused to use FusedMoE.
python/sglang/srt/models/granitemoe.py
- Removed tp_size from FusedMoE initialization.
python/sglang/srt/models/grok.py
- Removed tp_size from FusedMoE initialization.
python/sglang/srt/models/interns1.py
- Imported FusedMoE.
- Updated make_expert_params_mapping to use FusedMoE.
python/sglang/srt/models/internvl.py
- Imported FusedMoE.
- Updated make_expert_params_mapping to use FusedMoE.
python/sglang/srt/models/llama4.py
- Removed imports of get_local_attention_dp_size.
- Removed the initialization of self.local_dp_size.
python/sglang/srt/models/minicpm3.py
- Removed the import of global_server_args_dict.
python/sglang/srt/models/mixtral.py
- Removed the import of global_server_args_dict.
- Removed tp_size from FusedMoE initialization.
python/sglang/srt/models/olmoe.py
- Removed tp_size from FusedMoE initialization.
python/sglang/srt/models/qwen2_moe.py
- Removed imports of dataclass, Enum, and auto.
- Eliminated imports related to ExpertDistributionRecorder, LogitsProcessorOutput, EPMoE, and get_local_attention_dp_size.
- Removed enable_flashinfer_cutlass_moe from FusedMoE initialization.
- Removed the initialization of self.local_dp_size.
python/sglang/srt/models/qwen3_moe.py
- Removed numerous imports related to distributed training, activation functions, attention, linear layers, and batch management.
- Imported get_moe_a2a_backend and FusedMoE.
- Eliminated deepep_mode and enable_flashinfer_cutlass_moe from experts initialization.
- Updated the forward method to use get_moe_a2a_backend().
- Removed the initialization of self.local_dp_size.
- Modified make_expert_params_mapping to use FusedMoE.
python/sglang/srt/models/step3_vl.py
- Imported get_moe_a2a_backend.
- Updated the forward method to use get_moe_a2a_backend().
python/sglang/srt/models/xverse_moe.py
- Imported MoeRunnerConfig and TopK.
- Initialized self.moe_runner_config and self.topk.
- Updated the fused_moe call to use topk_output and moe_runner_config.
python/sglang/srt/server_args.py
- Replaced individual enable_flashinfer_cutlass_moe and enable_flashinfer_trtllm_moe flags with a single moe_runner_backend argument, offering a wider range of choices.
- Removed enable_triton_kernel_moe and enable_flashinfer_mxfp4_moe from the primary arguments.
- Added enable_flashinfer_cutlass_moe, enable_flashinfer_trtllm_moe, and enable_triton_kernel_moe to the list of deprecated arguments.
- Implemented logic in __post_init__ to issue warnings for deprecated flags and automatically map them to the new moe_runner_backend.
- Updated validation logic for flashinfer_cutlass_moe and flashinfer_trtllm_moe to rely on moe_runner_backend.
- Modified add_cli_args to reflect the changes in arguments.
- Updated model_specific_adjustments to use moe_runner_backend.
python/sglang/srt/two_batch_overlap.py
- Imported get_tbo_token_distribution_threshold and is_tbo_enabled.
- Updated _is_two_chunk_split_enabled to use get_tbo_token_distribution_threshold().
- Modified capture_one_batch_size and MaybeTboDeepEPDispatcher to use is_tbo_enabled().
- Added dp_padding_mode to the arguments of filter_batch.
python/sglang/test/test_block_fp8.py
- Imported TopKConfig.
- Updated calls to select_experts to use TopKConfig.
python/sglang/test/test_block_fp8_ep.py
- Imported TopKConfig.
- Updated calls to select_experts to use TopKConfig.
python/sglang/test/test_cutlass_w4a8_moe.py
- Imported TopKConfig.
- Updated calls to select_experts to use TopKConfig.
python/sglang/test/test_fp4_moe.py
- Imported TopKConfig.
- Updated calls to select_experts to use TopKConfig.
scripts/ci/ci_install_dependency.sh
- Added a command to clear the torch compilation cache, improving CI build reliability.
test/srt/quant/test_block_int8.py
- Imported TopKConfig.
- Updated calls to select_experts to use TopKConfig.
test/srt/quant/test_int8_kernel.py
- Imported TopKConfig.
- Updated calls to select_experts to use TopKConfig.
test/srt/test_fused_moe.py
- Imported TopKConfig.
- Updated calls to select_experts to use TopKConfig.
test/srt/test_triton_moe_channel_fp8_kernel.py
- Imported TopKConfig.
- Updated calls to select_experts to use TopKConfig.
test/srt/test_triton_moe_wna16.py
- Imported TopKConfig.
- Updated calls to select_experts to use TopKConfig.

Activity

The bot indicated that it had reached its daily quota limit.
The author, ch-wan, requested a gemini review.
The author, ch-wan, requested a gemini summary.
A bot review comment was made on python/sglang/srt/layers/moe/topk.py, advising against in-place modification of self.topk_config and suggesting logging warnings instead.

…factor/ep-framework

…ject#8849"

ch-wan added 7 commits August 5, 2025 18:05

clean topk

91dae43

clean dispatchoutput

327b186

use bypassedoutput

38ae731

clean modelopt

b6de409

use get_moe_a2a_backend and get_deepep_mode to replace global_server_…

62a78eb

…args_dict

enable moe_grouped_gemm_backend

bb89bae

rename DispatchOutputChecker

031bc67

ch-wan requested review from BBuf, HaiShaw, Ying1123, hnyls2002, ispobock, kushanam, merrymercy, xiezhq-hermann and zhyncs as code owners August 6, 2025 06:09

Merge commit 'cbbb738371a183f4a1eace147c9614ae6c8a2037' into cheng/re…

2949a02

…factor/ep-framework

ch-wan marked this pull request as draft August 6, 2025 06:12

reorganize

59b64f2

ch-wan changed the title ~~[5/N] MoE Refactor: Cleanup MoE-related configs~~ [6/N] MoE Refactor: Cleanup MoE-related configs Aug 6, 2025

ch-wan added 2 commits August 5, 2025 23:32

fix

eae243f

doc

08eb17d

ch-wan marked this pull request as ready for review August 6, 2025 06:36

fix torch_native

ed092da

zhyncs assigned ch-wan Aug 6, 2025

zhyncs added the high priority label Aug 6, 2025

ch-wan added 3 commits August 6, 2025 00:34

fix

7bd9e05

fix

6a06daf

Merge remote-tracking branch 'upstream/main' into cheng/refactor/ep-f…

848527c

…ramework

gemini-code-assist Bot reviewed Aug 11, 2025

View reviewed changes

Comment thread python/sglang/srt/layers/moe/topk.py Outdated

ch-wan and others added 7 commits August 11, 2025 11:54

Merge branch 'main' into cheng/refactor/ep-framework

a2993c6

Merge commit '90f44b74e6c27e695834b3dcb1ab0f83709c9e08' into cheng/re…

e3daacd

…factor/ep-framework

Merge commit 'f508cd3cb78a985ededa2a69697cf3049c18c85f' into cheng/re…

96fcde4

…factor/ep-framework

minor

05469da

Merge commit '0eec4cb6cccaefeb002ae4a8a4d982b5d2c8fb2e' into cheng/re…

f220b7f

…factor/ep-framework

cleanup

c514d7c

Merge commit '0edda32001938b578976409216bc6f9f36f719df' into cheng/re…

cb251b1

…factor/ep-framework

ch-wan force-pushed the cheng/refactor/ep-framework branch from 003ccbf to cb251b1 Compare August 12, 2025 23:55

ch-wan and others added 10 commits August 13, 2025 23:19

Merge commit 'a027a9b4b37459d59aa7656fa061dd1127e062cc' into cheng/re…

8f6776c

…factor/ep-framework

7bbe969

fix

449a1aa

refactor

c828613

fix

ea587e4

Merge branch 'main' into cheng/refactor/ep-framework

abb7291

refactor

c4218ba

upd

68f05ce

upd

ba4913d

Merge branch 'main' into cheng/refactor/ep-framework

be4e719

ch-wan merged commit 2958951 into main Aug 15, 2025
17 of 62 checks passed

ch-wan deleted the cheng/refactor/ep-framework branch August 15, 2025 04:14

ch-wan restored the cheng/refactor/ep-framework branch August 15, 2025 04:14

ch-wan deleted the cheng/refactor/ep-framework branch August 15, 2025 04:15

kkHuang-amd pushed a commit to HaiShaw/sglang that referenced this pull request Aug 15, 2025

Change the code for moe interface argument changed by the PR "sgl-pro…

c79f3a7

…ject#8849"

narutolhy pushed a commit to narutolhy/sglang that referenced this pull request Aug 17, 2025

[6/N] MoE Refactor: Cleanup MoE-related configs (sgl-project#8849)

85bb07a

elfiegg added a commit to elfiegg/sglang that referenced this pull request Aug 18, 2025

Update layer.py - clean up breakage after sgl-project#8849

7a4de47

gemini-code-assist Bot mentioned this pull request Aug 18, 2025

Update layer.py - fix breakage after cleanup #8849 #9318

Closed

4 tasks

MahmoudAshraf97 pushed a commit to MahmoudAshraf97/sglang that referenced this pull request Sep 8, 2025

[6/N] MoE Refactor: Cleanup MoE-related configs (sgl-project#8849)

e534715

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[6/N] MoE Refactor: Cleanup MoE-related configs#8849

[6/N] MoE Refactor: Cleanup MoE-related configs#8849
ch-wan merged 84 commits intomainfrom
cheng/refactor/ep-framework

ch-wan commented Aug 6, 2025 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Aug 6, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot commented Aug 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ch-wan commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Test

Benchmark & Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented Aug 6, 2025

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot commented Aug 11, 2025

Summary of Changes

Highlights

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ch-wan commented Aug 6, 2025 •

edited

Loading