One branch that contains EPLB + Two Batch Overlap + dependencies#5524
One branch that contains EPLB + Two Batch Overlap + dependencies#5524fzyzcjy wants to merge 3261 commits intosgl-project:mainfrom
Conversation
51633b5 to
b69c117
Compare
|
I tested this PR with DeepEP + EPLB and found that each rank only tracks the expert load on its local GPU, with no cross-rank communication/summation happening at all. The saved expert distribution JSON file shows logical counts for a layer like: This suggests the load balancing logic is not properly aggregating expert usage across all ranks. Could you please clarify or fix this behavior? |
|
May I ask what the command is to reproduce Case1 with 3P+9D?
|
|
I found an issue both in self.expert_distribution_communicator = _Communicator(
self.send_to_scheduler, server_args.dp_size
)Could u please double-check if |
|
Hi, could you please discuss in the issue, since this PR contains many commits and will make comments hidden |
|
Close this since everything is merged to the master. |
Description
This branch merges various other branches and PRs, including mine and @ch-wan and others'. This branch is not meant to be merged (please merge the various PRs instead). However, this branch serves as the purpose that, people may want to have a try on these features. Indeed it works well and quick now when I test it.
Below (folded) are some pretty early experiments:
Details
Experiment 1: PD + EPLB + TBO (two batch overlap)
gsm8k repeated:
Experiment 2: baseline vs baseline+EPLB vs baseline+EPLB+TBO
Remarks
2025.04.25 Update
I forgot to paste the latest results which were done before... So here are some. You can reproduce them using this branch of code.
Case 1: Direct decode
Case 2: Simulated MTP decode
Case 3: Prefill