[6/N] MoE Refactor: Cleanup MoE-related configs#8849
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
…factor/ep-framework
There was a problem hiding this comment.
Code Review
This pull request is a large-scale refactoring of MoE-related configurations. It introduces MoeRunnerConfig and TopKConfig to encapsulate parameters, adds a new --moe-runner-backend argument to unify several flags, and centralizes configuration logic to avoid direct use of global state. These changes significantly improve code clarity, maintainability, and organization. The implementation appears correct and consistent with the stated goals. I've found one potential issue with in-place modification of a configuration object, which could lead to unexpected behavior.
Summary of ChangesThis pull request significantly refactors the Mixture-of-Experts (MoE) related configurations and their usage throughout the codebase. My primary goal was to centralize MoE runner settings into a new Highlights
Changelog
Activity
|
…factor/ep-framework
…factor/ep-framework
…factor/ep-framework
…factor/ep-framework
003ccbf to
cb251b1
Compare
…factor/ep-framework
Motivation
--moe-runner-backendand deprecating--enable-triton-kernel-moe,--enable-flashinfer-cutlass-moe, and--enable-flashinfer-trtllm-moe.TopKOutputCheckerandDispatchOutputCheckerto make pylint happy.global_server_argsin moe-related logics.MoeRunnerConfigto wrap up moe runner configs.Modifications
Accuracy Test
Benchmark & Profiling
Checklist