Faster overlap mode scheduler#1738
Conversation
44108b7 to
8685533
Compare
|
@merrymercy Has this been tested on larger models? I tried the deepseek-v2.5 fp8 version, but it doesn't seem to show much improvement. |
|
@merrymercy Have you ever tested overlap mode scheduler when receiving requests at a certain request rate rather than sending all the requests at the beginning? |
|
@ykcombat Did you try it with the latest main branch? If the error is still there, please open a new issue with reproducible instructions. We will fix it very soon if we can reproduce that. |
@merrymercy Thanks for your quick reply! I tried it with the latest main branch but it seems that the error is still there. I have opened a new issue at #2312. |
This PR improves the order of kernel launch and result fetching. Now the overlap scheduler can bring 10% throughput improvement even when radix cache is turned off. When the radix cache is turned on, we can expect more speedup.
Benchmark results
Overlap mode: 51.03 req/s
Normal mode: 46.06 req/s
Notes