Add endpoints to dump selected expert ids#4435
Conversation
|
Note that this expert id recording involves copying tensors from gpu to cpu during the model execution so will affect throughput and latency significantly. I will add an argument to turn off the recording by default. |
|
Hi @yuhsaun-t, thank you for your contribution. Could you please provide more details about the output of |
Hello @ch-wan , sure, here is an attached output json from the dump. Right now there will be one json file for each gpu rank, but the content is the same. I can update the code to dump only on rank 0. Yes, I will update to use the non-blocking copy. |
This commit lets users turn recording on/off freely so that by default it does not affect performance.
f603ecf to
2a34323
Compare
|
Hello @ch-wan , I have updated the PR based on the reviews, and added unittests and docs. Can you review the PR again? Thanks! |
|
@yuhsaun-t Thank you very much for your great effort! This PR would be very useful for analyzing experts' dynamic workloads during MoE serving. I have added some comments. Could you please take a look? |
ch-wan
left a comment
There was a problem hiding this comment.
Another question comes to my mind when I double-check this PR. When DP is enabled, the recorders from different workers will collect different experts distribution. We may need all-reduce to synchronize their results, and only the master worker can dump results.
The current implementation makes it so that the server will dump one csv file for one rank. I think we can keep it this way so that the server does not have to synchronize on the fly to save performance. We can process the dumped csv files later and aggregate them into one file. Does that sound good to you? |
|
THank you for your great effort! I have approved the change. |
Motivation
When optimizing the performance of MoE models, understanding the expert id distribution helps us to identify the performance bottlenecks and come up with a plan to fix performance issues. Such information can be captured in
python/sglang/srt/layers/moe/topk.py.Modifications
python/sglang/srt/managers/utils.pyto record the layer id, expert id, and the topk id in a data structure.python/sglang/srt/models/deepseek_v2.pyto recored the layer id into the data structure. (The layer id recording is optional and can be removed.)python/sglang/srt/layers/moe/topk.pyto record the expert id and the topk id into the data structure.python/sglang/srt/entrypoints/http_server.pyto dump the information captured. All the other changes underpython/sglang/srt/managersare related to the two endpoints added.Checklist