GentleCold

Follow

🎯

Focusing

GentleCold GentleCold

🎯

Focusing

Follow

Simple

9 followers · 12 following

ECNU
Shanghai, China
06:05 (UTC +08:00)
gentlecold.top

Achievements

Achievements

Highlights

Pro

GentleCold/README.md

Hi 👋, I'm GentleCold

🎓 Pursuing master's degree in ECNU.
🧐 Interested in LLM inference acceleration and KV cache systems.
⚡️ Working on GPU/SSD offloading and distributed cache sharing.
🐧 Using Arch Linux btw.

Selected Work 🎯

DaseR: RAG-native KV cache service for LLM inference.
pegaflow: high-performance KV cache storage with GPU offloading, SSD caching, and RDMA-based sharing.
LMCache: exploring KV cache reuse and offloading for LLM serving.
nano-vllm: learning and experimenting with compact vLLM-style inference systems.

Pinned Loading

dotfiles dotfiles Public

Real passion starts with linux.

Lua 2
DaseR DaseR Public

RAG-native KV cache service for LLM inference.

Python 2
pegaflow pegaflow Public

Forked from novitalabs/pegaflow

High-performance KV cache storage for LLM inference — GPU offloading, SSD caching, and cross-node sharing via RDMA. Works with vLLM and SGLang.

Rust
LMCache LMCache Public

Forked from LMCache/LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

Python
NumaTBB NumaTBB Public

A thread building block based on numa and gpu

C++
multimodal_sentiment_analysis multimodal_sentiment_analysis Public

多模态情感分析模型实现

Jupyter Notebook 12