[Preview] Mooncake performance optimization#934
Closed
xiaguan wants to merge 4 commits intoLMCache:devfrom
Closed
Conversation
Contributor
Author
|
duplicate with #1269 close |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR introduces significant performance optimizations for Mooncake integration with LMCache, focusing on three key areas:
🚀 Performance Optimizations
1. Scheduler Query Performance Enhancement
2. Zero-Copy Implementation
get_intoandput_fromAPIs for zero-copy data transfer3. Batch Interface Implementation
batch_getandbatch_get_intofor parallel data retrievalbatch_putandbatch_put_fromfor efficient data storage🔧 Technical Changes
📊 Performance Impact
lmcache config
mooncake config(Mooncake main branch installed)
dev branch, mooncake, 8192 input 128 output, 50 prompt, hit all, request rate inf
This branch, same bench
Note
This PR is a preview and will NOT be merged directly. We will collaborate with the LMCache team to gradually decompose and integrate these changes into the repository. This preview is provided for those who want to test Mooncake performance optimizations early.
For early adopters wanting to test these optimizations, please use this branch with caution and provide feedback on performance improvements.