[Feat]New radix cache backend: pegaflow#17221
Conversation
Summary of ChangesHello @jimmy-evo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates Pegaflow, a new and advanced KV cache backend, into the system, significantly enhancing memory management and performance for large language models. Alongside this major feature, it refines the tracing mechanism to ensure better observability by propagating external trace contexts more effectively. Additionally, a crucial check for local model file integrity has been implemented to prevent runtime errors caused by incomplete Hugging Face model downloads. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
b79bf7f to
cd4941b
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces pegaflow as a new radix cache backend, which can be enabled with the --enable-pegaflow flag. The implementation looks solid. Additionally, the PR includes several other improvements: it refactors trace context propagation for better clarity by using external_trace_header, and enhances the model loading process by adding checks for incomplete local snapshots, which improves robustness. The configuration for setuptools_scm is also updated.
I have one minor suggestion regarding a potential typo in the pegaflow import to improve naming consistency. Overall, these are great additions to the project.
I am having trouble creating individual review comments. Click here to see my feedback.
python/sglang/srt/managers/scheduler.py (693-695)
There seems to be a typo in the import path and class name. The feature is named "pegaflow", but here it's written as "peagflow". For consistency, I suggest renaming peagflow_radix_cache to pegaflow_radix_cache and PeagflowRadixCache to PegaflowRadixCache in the pegaflow library, and updating the import here accordingly.
from pegaflow.sglang.pegaflow_radix_cache import PegaflowRadixCache
self.tree_cache = PegaflowRadixCache(
|
Could you please paste the performance benchmark comparison results? |
|
/gemini summary |
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
@hzh0425 |
Hi from novita.ai team 👋
A new KV cache backend has been adapted: Pegaflow
PegaFlow centralizes machine-level KV cache management into a standalone Rust process. Inference engines map their KV cache to PegaFlow via CUDA IPC, enabling D2H/H2D transfers to occur in a separate process while communicating through gRPC.
Key Benefits
Modifications
enable with argument
Accuracy Tests
sglang run eval mmlu result:
device: h200 TP8 Deepseek-V3.2
Benchmark
H20-3e TP8
No radix cache
benchmark script:
input 4096
output 128
result:
with pegaflow with 500gb memory
after flush L1 cache
hicache L2 only
request at least 640gb memory
populated:
Acc mmlu
First time
after
flush_cacheChecklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci