perf(mooncakestore connector): optimize get/put with zero-copy operations#1269
perf(mooncakestore connector): optimize get/put with zero-copy operations#1269maobaolong merged 2 commits intoLMCache:devfrom
Conversation
|
Are you available to review this? I've modified it to the connector level based on the feedback, which introduces a slight performance overhead, but overall it still delivers substantial performance improvements. Hope we can merge either of these optimizations soon. Thanks! |
maobaolong
left a comment
There was a problem hiding this comment.
@xiaguan Thanks for this PR, left some comments.
| """ | ||
| raise NotImplementedError | ||
|
|
||
| async def batched_get( |
There was a problem hiding this comment.
This seems duplicated with L209
…ions Signed-off-by: Jinyang Su <751080330@qq.com>
f7adf01 to
7f759e0
Compare
|
@maobaolong Thanks for your review. All issues have been addressed. Could you please take another look? I've only modified the |
|
@xiaguan Thanks for help me and @chunxiaozheng to go through the code by tencent metting, it really help us to know well to this PR quickly. Will give you some fead back this week. First of all this PR really supply some performance improvement points by batched operation and zero copy. |
maobaolong
left a comment
There was a problem hiding this comment.
@xiaguan This PR look good overal left some exception handling related comments inline, otherwise, it LGTM. thank for your improvement.
|
@xiaguan After a discussion with @chunxiaozheng offline, I aware that we should not raise the exception, return the mem_object we read or None is enough, because there is another work is right doing parallel by others. For this PR, it is ok for me to keep it not worse than before. |
…mooncake_connector Signed-off-by: Jinyang Su <751080330@qq.com>
1fe203a to
74d7c60
Compare
|
Thank you both for your thorough reviews - all comments have been resolved. |
|
Could you please explain why the method batched_submit_put_task() was not included in this commit? I believe it would also bring performance improvements. |
Write requests are executed asynchronously in the background and aren't on the critical path. There might be some benefit to this approach - feel free to give it a try. |
|
Nice |
…tions (LMCache#1269) perf(mooncakestore connector): optimize get/put with zero-copy operations Signed-off-by: Jinyang Su <751080330@qq.com>
…tions (LMCache#1269) perf(mooncakestore connector): optimize get/put with zero-copy operations Signed-off-by: Jinyang Su <751080330@qq.com>
…tions (LMCache#1269) perf(mooncakestore connector): optimize get/put with zero-copy operations Signed-off-by: Jinyang Su <751080330@qq.com>
…tions (LMCache#1269) perf(mooncakestore connector): optimize get/put with zero-copy operations Signed-off-by: Jinyang Su <751080330@qq.com>
another version of #988
Although I've run multiple benchmark, the performance doesn't match that of PR #988. This might be because the remote connector requires additional code paths to handle operations. However, it still provides significant performance improvements.
optimized connector
compare to result in #988, throuput 77494 tok/s, mean ttft: 925ms, media ttft: 947ms