Skip to content
View GentleCold's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report GentleCold

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
GentleCold/README.md

Hi 👋, I'm GentleCold

  • 🎓 Pursuing master's degree in ECNU.
  • 🧐 Interested in LLM inference acceleration and KV cache systems.
  • ⚡️ Working on GPU/SSD offloading and distributed cache sharing.
  • 🐧 Using Arch Linux btw.

Selected Work 🎯

  • DaseR: RAG-native KV cache service for LLM inference.
  • pegaflow: high-performance KV cache storage with GPU offloading, SSD caching, and RDMA-based sharing.
  • LMCache: exploring KV cache reuse and offloading for LLM serving.
  • nano-vllm: learning and experimenting with compact vLLM-style inference systems.

GentleCold's GitHub activity graph

Pinned Loading

  1. dotfiles dotfiles Public

    Real passion starts with linux.

    Lua 2

  2. DaseR DaseR Public

    RAG-native KV cache service for LLM inference.

    Python 2

  3. pegaflow pegaflow Public

    Forked from novitalabs/pegaflow

    High-performance KV cache storage for LLM inference — GPU offloading, SSD caching, and cross-node sharing via RDMA. Works with vLLM and SGLang.

    Rust

  4. LMCache LMCache Public

    Forked from LMCache/LMCache

    Supercharge Your LLM with the Fastest KV Cache Layer

    Python

  5. NumaTBB NumaTBB Public

    A thread building block based on numa and gpu

    C++

  6. multimodal_sentiment_analysis multimodal_sentiment_analysis Public

    多模态情感分析模型实现

    Jupyter Notebook 12