Reimplement VRAM buffering in TCP transport#702
Reimplement VRAM buffering in TCP transport#702ShangmingCai merged 5 commits intokvcache-ai:mainfrom
Conversation
| char *dram_buffer = addr + total_transferred_bytes_; | ||
|
|
||
| #ifdef USE_CUDA | ||
| if (isCudaMemory(addr)) { |
There was a problem hiding this comment.
For each transfer, we need to invoke it once. Why not move this check to the register memory phase?
There was a problem hiding this comment.
The memory type check is simpler than lookup the memory registration table.
stmatengss
left a comment
There was a problem hiding this comment.
Only one concern. Remains look good to me.
|
@alogfans @ShangmingCai I have completed thorough testing on the RTX 4090, and the output is correct with no issues under stress testing. The previous coredump problem I encountered was resolved after changing the sglang pagesize parameter from 1 to 32. If it is still set to 1, the error persists. |
@ZhenshengWu Thx for the info! Guess we can merge this now, and check where (maybe sglang) we should fix to address the coredump later. |
|
I try to use tcp as mooncake backend to do pd disaggregation with sglang, but I still got some errors. Here is my environment, start script and logs. Here is my env Start script for P: D node0: D node1: Here are the logs. Should I just upgrade sglang version? |
There are no code changes for this feature. The error messages are here. ( Could you take a look? @alogfans |
|
Did you export |
Yes, this error only occurs when after I export MC_FORCE_TCP=1 |
You generate more mooncake logs by setting |
The logs remain the same after I |
Did you solve the problem? I have the same. |
* Re-implement vram support * Test logging * Remove CUDA logging line * Add comments * Avoid memcpy if addr is dram
* Re-implement vram support * Test logging * Remove CUDA logging line * Add comments * Avoid memcpy if addr is dram
No description provided.