Skip to content

Releases: LMCache/LMCache

Nightly 2026-05-09 · CUDA 13.0

09 May 10:06
f35456e

Choose a tag to compare

Pre-release

Nightly CUDA 13.0 wheels built from dev on 2026-05-09.

uv pip install lmcache --pre \
  --extra-index-url https://download.pytorch.org/whl/cu130 \
  --find-links https://github.com/LMCache/LMCache/releases/expanded_assets/nightly-cu13 \
  --index-strategy unsafe-best-match

Nightly 2026-05-09 · CUDA 12.9

09 May 09:30
f35456e

Choose a tag to compare

Pre-release

Nightly CUDA 12.9 wheels built from dev on 2026-05-09.

uv pip install lmcache --pre \
  --extra-index-url https://download.pytorch.org/whl/cu129 \
  --find-links https://github.com/LMCache/LMCache/releases/expanded_assets/nightly \
  --index-strategy unsafe-best-match

Release v0.4.4 · CUDA 13.0

23 Apr 06:25
b605369

Choose a tag to compare

CUDA 13.0 wheel for LMCache v0.4.4.

uv pip install lmcache==v0.4.4 \
      --extra-index-url https://download.pytorch.org/whl/cu130 \
      --find-links https://github.com/LMCache/LMCache/releases/expanded_assets/v0.4.4-cu13 \
      --index-strategy unsafe-best-match

v0.4.4

22 Apr 22:55
6fbec46

Choose a tag to compare

What's Changed

  • Refactor remote plugin to accept multiply connector by @maobaolong in #2666
  • [MP]feat: support different kv cache shape and dtype across layers by @liuyumoye in #2926
  • [Chore][CI]: K3 base CI image 12.9 CUDA by @sammshen in #2975
  • fix: use pin=False in _allocate_and_put to prevent pd_buffer leak by @ningziwen in #2847
  • feat(disk): support multi-path local disk backend for multi-device I/O by @glimchb in #2801
  • [Chore][CI] Upgrade CI base image to CUDA 13.0 by @sammshen in #2981
  • [doc] document long-doc-permutator workload in cli bench by @deng451e in #2963
  • [MP][Bugfix] Fix deadlock caused by cuda launch host func by @ApostaC in #2952
  • [BugFix]: Fix typo bug by @princepride in #2980
  • [CI] Pin cu128 nightly wheel for blend ci test by @deng451e in #2987
  • [MP][optimize] optimize save when mla enabled by @chunxiaozheng in #2935
  • [hotfix] fix prometheus version for UT failure by @ApostaC in #3000
  • Update LMCache Office Hours to Wednesday by @nijaba in #2990
  • [fix] Limit proxy in-flight requests to prevent PD buffer deadlock by @deng451e in #2957
  • [MP] Lazy start heartbeat thread when first req coming by @maobaolong in #2943
  • [Operator] Add L2 RESP (Redis/Valkey) adapter support by @royyhuang in #2967
  • [Feat][RawBlock] Add TP>1 support and compact batched retrieval path by @DongDongJu in #2948
  • [MP] Introduce a simple way to register_gauge metrics. by @maobaolong in #2906
  • [Build] Add lmcache-cli lightweight wheel by @deng451e in #2959
  • Copy a snapshot of lmcache_mp_connector.py for vllm 0.18.0 by @maobaolong in #2887
  • [MP] Add a new argument to specify whether retain_in_l1 by @maobaolong in #2813
  • [Chore][CI] Skip k3 builds when only docs/trivial files changed by @sammshen in #2993
  • [ops][refactor] Add full list of Python fallbacks to run without compiled CUDA extensions by @hlin99 in #2591
  • [Feat] L0 Subscriber by @Oasis-Git in #2974
  • refactor: extract PathSharder module for shared multi-path selection by @glimchb in #2982
  • refactor(mp): replace job_id with request_id in query_prefetch_status by @yoo-kumaneko in #2996
  • [MP] Support lazy import built-in l2 adapter by @maobaolong in #2905
  • [MP][Optimize] Skip locked keys during LRU eviction to improve eviction efficiency by @chunxiaozheng in #2978
  • fix: add controller config validation and clear error messages (#2907) by @ianliuy in #3003
  • feat: add chunk hashes logger to MP server for offline data analysis by @yoo-kumaneko in #2928
  • [Chore][CI]: K3 MP output token quantity tolerance by @sammshen in #3030
  • feat(tools): add LRU cache simulator for lookup-hash JSONL logs by @yoo-kumaneko in #3021
  • [Feat] L1 Subscriber by @Oasis-Git in #2986
  • [Feat] Add cache_salt parameter to MP adapter interfaces by @royyhuang in #3029
  • [Feat] Add is_user_level property and cache_salt param to EvictionPolicy by @royyhuang in #3032
  • [Feat][DAX] Optimize staged batched restore path and document modification by @DongDongJu in #2904
  • [Chore] Remove v0 code by @sammshen in #2968
  • [Chore] add coding standard and PR review instructions by @ApostaC in #3039
  • [Observability] Per-request root OTel span and SpanRegistry for MP server tracing by @deng451e in #3033
  • feat(pd_backend): add pd_skip_proxy_notification to skip ZMQ proxy notification by @ningziwen in #2874
  • [Bugfix] fix some memory leak in cache_engine and eic connector by @liubj77 in #2544
  • [Hotfix][CI] Unblock CI: pandas auto-heal + CUDA 12 build toolchain by @sammshen in #3055
  • [Hotfix][CI] Pin vLLM nightly to cu130 index to match CUDA 13 base image by @ApostaC in #3061
  • [Docs] Mirror lmcache/ layout in docs/design/ for discoverability by @ApostaC in #3040
  • Add scheduler instance_id and model_name to L0 KV lifecycle tracking by @Oasis-Git in #3043
  • chore: expose package version via init.py by @hlin99 in #3034
  • Fix: Safely handle layerwise cache shape dimensions in remote backend by @hlin99 in #2751
  • [Core] Add persistence interfaces and nixl persistence by @YaoJiayi in #2938
  • [Misc] Reduce the logs generated by lazy memory allocator by @ApostaC in #3068
  • [MP][Feat] Add cache_salt to ObjectKey for cache isolation by @royyhuang in #3042
  • [ROCm] Make bare-host ROCm install self-sufficient by @Shaoting-Feng in #3070
  • [MP] Add tracing functionality for storage manager by @ApostaC in #3063
  • [MP][optimize] unified touch all keys in end session request by @chunxiaozheng in #3020
  • [step3] remove unnecessary code in mp adapter by @chunxiaozheng in #2994
  • fix(mp): correct store cached requests in lmcache_mp_connector by @maobaolong in #3012
  • [refactor]: Replace use_cufile with use_gds/gds_backend config flags by @glimchb in #2858
  • [CI] Add cu13.0 wheel + container builds and nightly wheel releases by @deng451e in #3069
  • [CI] Run the same test set on AMD as on NVIDIA by @Shaoting-Feng in #3071
  • [ROCm][MP] Fix HIP invalid-argument on lazy host buffer past 2 GB by @Shaoting-Feng in #3079
  • [CLI] Refactor query command by @deng451e in #2995
  • [CI] add missing egress endpoints to nightly Docker build by @deng451e in #3087
  • [Hotfix][CI] Fail-fast when vLLM CLI import chain is broken post-install by @sammshen in #3093
  • [CLI][fix] lazy torch import in init.py to unblock CLI-only installs by @deng451e in #3086
  • [CLI] Introduce lmcache trace CLI by @ApostaC in #3075
  • [Chore][Docs]: daily drift check — multi-process mode by @ApostaC in #3076
  • [Fix][CI] fix nightly wheel versioning and build reliability by @deng451e in #3097
  • [Hotfix][CI] Replace vllm main.py patch with sitecustomize.py by @sammshen in #3100
  • [CI] fix blend-server venv by @deng451e in #3099
  • [MP] Introduce MP runtime plugin framework by @maobaolong in #2956

New Contributors

Full Changelog: v0.4.3...v0.4.4

v0.4.3

06 Apr 23:46
7f32611

Choose a tag to compare

What's Changed

  • [MP] fix: add thread safety to Session for concurrent TP worker access by @maobaolong in #2807
  • [CLI] Implement initial framework of LMCache CLI by @KuntaiDu in #2775
  • [MP][Observability][1/3] EventBus core infrastructure + OpenTelemetry dependency by @royyhuang in #2792
  • [MP]: Support delay start heartbeat thread to avoid unhealthy while start vllm for a huge module warmup. by @maobaolong in #2798
  • fix: add None check before stream synchronization by @hlin99 in #2810
  • [Core] Add VRAM_SEG support for NIXL OBJ plugin by @jgoldsch12 in #2640
  • [CI]: create fallback for flaky nightly index by @sammshen in #2809
  • [CI]: add full tag selectively by @sammshen in #2820
  • fix: replace global lock with per-device transfer_lock to prevent deadlock by @maobaolong in #2816
  • Refactor KV cache shape/dtype extraction for robustness by @hlin99 in #2537
  • Support non-contiguous alloc in MemoryAllocator by @chunxiaozheng in #2767
  • [MP][Observability][2/3] Migrate L1 + SM to EventBus + OTel, remove old Prometheus pipeline by @royyhuang in #2794
  • [MP][Bugfix] fixing race condition for zmq output notifier by @ApostaC in #2808
  • [ci]: agent reviewer prompt engineering by @sammshen in #2800
  • [refactor]: clean up the messy LMCacheManager by @sammshen in #2683
  • [Platform]: Add Intel Gaudi (HPU) Support by @hlin99 in #2822
  • [CLI] Implement lmcache describe kvcache subcommand by @royyhuang in #2825
  • [MP][Feat] Query lookup-phase status for MP mode by @ApostaC in #2818
  • Add Device-DAX (/dev/dax) storage backend for KV cache (follow-up to #2714) by @jayhpark530 in #2788
  • [Temp CI Patch]: torch version for UT by @sammshen in #2856
  • [CI] Add GitHub Action to auto-sync torch version with vLLM by @deng451e in #2796
  • [MP][Feat] support worker-affinity in the MQ thread pool by @ApostaC in #2842
  • Introduce native fs connector by @maobaolong in #2779
  • [CLI] Implement lmcache ping subcommand by @Oasis-Git in #2859
  • [MP] Fault Tolerance CI by @Oasis-Git in #2764
  • feat: improve ValkeyConnector with cluster mode, TLS, and GLIDE optimizations by @omerrubi-amzn in #2790
  • fix: auto-generate lmcache_instance_id when value is None by @can-sun in #2732
  • [CI]: use job-level path filtering so skipped tests pass required checks by @royyhuang in #2855
  • [MP] Print inference request id to help identify which vllm request the current log belongs to by @maobaolong in #2812
  • [HW: XPU] Enable Layerwise XPU Connector by @slokesha in #2611
  • [CLI] lmcache query engine subcommand by @deng451e in #2846
  • [CLI]: Server command by @sammshen in #2836
  • [LMCache CLI] Design and implementation of lmcache kvcache by @KuntaiDu in #2827
  • [Bugfix]: Fix pin count balancing in PD Disaggregation mode by @lisiG9 in #2786
  • [Core] [GDS] Improve GDS backend error handling and retry logic by @oferki in #2675
  • [CLI][Doc] Edit the doc for LMCache CLI by @KuntaiDu in #2870
  • Add hipFile support for AIS (AMD Infinity Storage) storage by @glimchb in #2799
  • [CI]: Fix the LMCache random throughput being higher than native vllm by @sammshen in #2864
  • [3/N][Feat]Persist metadata on device and fix raw-device benchmark setup by @DongDongJu in #2614
  • [Core]: Support HND KV Format by @sammshen in #2826
  • [Chore][Docs] Fix mp docs for store policy: skip_l1 by @ApostaC in #2869
  • [MP][Core] Block id based kernel for MP mode by @ApostaC in #2838
  • [CLI] update cli lmcache query engine by @deng451e in #2871
  • [MP] Improve the stability for controllers and improve log clarity by @ApostaC in #2883
  • [Chore][Docs] Stale MP CLI and Flags by @sammshen in #2882
  • [Fix][Operator] Add privileged mode and nvidia runtime for GPU visibility by @royyhuang in #2749
  • [Chore][CI]: chmod +x scripts in k3 test entrypoints by @sammshen in #2886
  • feat(gds): add multipath KV-cache offloading support by @glimchb in #2817
  • fix: add missing lock protection for LRU cache policy by @SYaoJun in #2860
  • [MP][Observability][3/3] Migrate MP server telemetry to EventBus, unify config by @royyhuang in #2806
  • [doc] update installation compatibility doc by @deng451e in #2868
  • [Build] add SM120 for wheel build by @deng451e in #2873
  • [1/2] L2 CI: End to End Performance by @Oasis-Git in #2884
  • [fix] add missing request type in blend server by @deng451e in #2894
  • type: Add missing return type annotations to storage backend methods by @SYaoJun in #2829
  • [CLI] Implementation of lmcache bench engine by @ApostaC in #2889
  • feat(gds): enable parallel I/O thread pool for all cuFile filesystems by @glimchb in #2802
  • [DSA] support DSA in Mooncake connector by @chunxiaozheng in #2897
  • [Core] Add L2 eviction in mp mode by @YaoJiayi in #2824
  • [Bugfix] fix the invalid image path by @SYaoJun in #2899
  • [Chore][CI] Split k3 multiprocess tests into parallel pipeline steps by @sammshen in #2914
  • Support l2 adapter check and improve basic_check tool by @maobaolong in #2895
  • [Chore][CI/Docs]: Switch all the documentation and CI over to lmache cli by @sammshen in #2917
  • [CI] Add CI test for CB by @deng451e in #2900
  • [2/2] L2 CI: Telemetry Test by @Oasis-Git in #2913
  • [Core] Add eviction for CB by @YaoJiayi in #2893
  • Refactor: Generalize utils.py for all devices by lifting the CUDA limitation by @hlin99 in #2848
  • Add argument --prefetch-max-in-flight to fix hardcode by @maobaolong in #2789
  • [MP] Refactor l2 plugin framework to support dynamic load third-party native l2 connector by @maobaolong in #2851
  • fix: relax worker port count assertion by @can-sun in #2867
  • [Bugfix]: patch save_decode_cache by @sammshen in #2929
  • vllm block event by @Oasis-Git in #2930
  • [Feat]: Add eviction to L2 Native Backend by @sammshen in #2939
  • [Connector] Maru: zero-copy KV cache sharing via CXL shared memory by @jooho-XCENA in #2705
  • [MP] Fix UT after merge #2851 by @maobaolong in #2931
  • [Bugfix]: fix get_num_heads for MLA format by @sammshen in #2941
  • [MP] Introduce l2 mooncake adapter by @maobaolong in #2911
  • [CLI]Add long-doc-permutator CLI bench workload by @deng451e in #2937
  • feat(gds): add gds_path_sharding config for multi-path strategy by @glimchb in #2922
  • [Security][Remote Connector]: Add env var auth config for RESP by @sammshen in #2949
  • Refactor: Align pd_buffer_size to chunk size in PD backend by @hlin99 in #2694
  • [Chore] Add CODEOWNERS for automated PR review assignments by @sammshen in #2950
  • [Chore][CI]: Change dst for K3 nightly comprehensive results by @sammshen in #2958

New Contributors

Read more

v0.4.2

17 Mar 22:59
9d41318

Choose a tag to compare

What's Changed

  • fix(l1_manager): propagate extra_count through prefetch path to prevent premature eviction by @liuyumoye in #2725
  • [vllm adapter] num_lmcache_cached_tokens by @aeon-x in #2670
  • [ci]: add gpu monitoring by @sammshen in #2718
  • [CI][Hotfix][Chore] remove the repetitive definition of report_status by @ApostaC in #2745
  • [Perf] [GDS] Performance improvements to GDS backend by @oferki in #2637
  • Fault Tolerance Check by @Oasis-Git in #2692
  • [Misc] Remove Hash from IPCCacheEngineKey by @Oasis-Git in #2700
  • [MP][optimize] optimize evict in lru policy by @chunxiaozheng in #2740
  • [RFC] Design of LMCache CLI by @KuntaiDu in #2748
  • [MP][Bugfix] introducing new l1 listener to prevent re-storing prefetched object by @ApostaC in #2744
  • Add filesystem-backed L2 adapter with auto-discovery plugin mechanism by @maobaolong in #2704
  • fix(server): guard finish_read_prefetched behind retrieve_succeeded flag by @maobaolong in #2736
  • Fix[config]: replace store_true with BooleanOptionalAction for --l1-use-lazy by @liuyumoye in #2761
  • [Correctness]: Fix the overlapping race condition for non-MP as well by @sammshen in #2706
  • [Southbound]: Create a Native Protocol for MP and non-MP by @sammshen in #2642
  • [ci]: fix k3 comprehensive test nightly baseline retrieval by @sammshen in #2753
  • [Perf] Add stream priority in gpu context by @YaoJiayi in #2728
  • [Doc] Add doc for LMCache MP mode operator by @royyhuang in #2731
  • [Docs][Operator] Fix observability metric descriptions by @royyhuang in #2746
  • [MP][Feat] Support dedicated thread pool for MP callbacks by @ApostaC in #2763
  • [MP][UX][L2] Support configuring L2 store/prefetch policy via command line by @ApostaC in #2773
  • Fix regression: restore config validate() call in config.py by @hlin99 in #2690
  • [MP] Support buffer only mode for MP mode by @maobaolong in #2760
  • Plugin L2 Adapter Framework for MP Mode by @maobaolong in #2715
  • [MP][Bugfix] fix free error when memory_objs is empty by @chunxiaozheng in #2768
  • update torch version aligned with vllm by @deng451e in #2782
  • Support database option at Valkey connector by @bluayer in #2307
  • feat(kv_cache): enable asymmetric store/retrieve storages in PD backend by @hlin99 in #2509

Full Changelog: v0.4.1...v0.4.2

v0.4.1

11 Mar 02:55
dfc914c

Choose a tag to compare

What's Changed

  • [MP][Bugfix] fix vllm-side lookup logical issue and cuda stream deadlock problem by @ApostaC in #2733

Full Changelog: v0.4.0...v0.4.1

v0.4.0

10 Mar 23:16
a13ad66

Choose a tag to compare

Major Milestones

v0.4.0 marks the maturation and shift in LMCache towards the new Multiprocess mode.

What's Changed

  • [feat] add free_locks api to MP mode by @royyhuang in #2656
  • [Add] L2 Prefetch Controller and StorageManager integration by @ApostaC in #2667
  • K3 CI Refactor by @sammshen in #2663
  • Tell agent to write documentations by @KuntaiDu in #2655
  • Refactor new_block_ids handling for robustness by @hlin99 in #2536
  • [CI] Fix mypy errors by @hickeyma in #2672
  • fix(lmcache): fix KV cache hash inconsistency due to None in extra_keys by @JianDan0212 in #1897
  • Augmenting contributing.md by @KuntaiDu in #2654
  • Bump actions/download-artifact from 6.0.0 to 7.0.0 by @dependabot[bot] in #2397
  • [MP][UX] Unified config + argparse for multiprocess mode by @ApostaC in #2695
  • [CI]: 5 day maximum for Comprehensive Test flexibility by @sammshen in #2676
  • [Correctness]: Avoid overwriting APC overlap by @sammshen in #2671
  • [MP][Observability] Add telemetry subsystem for multiprocess mode by @ApostaC in #2696
  • [1/N] Support NIXL-based L2 storage in MP mode by @YaoJiayi in #2664
  • [Feat] LMCache MP mode k8s operator by @royyhuang in #2701
  • [MP][Telemetry] Hot-fix to enable the telemetry logging for store by @ApostaC in #2707
  • [Bugfix] Fix memory leak in asynchronous mode by @deng451e in #2559
  • [Misc] Improve nixl perf in lmcache mp by @YaoJiayi in #2711
  • [MP][Core] Update the workflow for lookup to avoid busy loop by @ApostaC in #2710
  • Fix to support mla multiple tp failed to read issue by @maobaolong in #2697
  • Refactor lookup client/server and abstract rpc layer. by @maobaolong in #2609
  • [Core] Add blend_server_v2 by @YaoJiayi in #2677
  • [Bugfix] fix crash in wait_for_save when retrieve fail from lmcache_engine by @liubj77 in #2516
  • [Chore][Docs] Update docs for MP mode by @ApostaC in #2708
  • [Misc] Fix failing unit test in blend server by @YaoJiayi in #2717
  • [MP][Debuggability] Introduce status report subsystem for MP-mode by @ApostaC in #2699
  • [MP][Hotfix] add default implementation for report_status by @ApostaC in #2723
  • [MP] Support MP Server restart by @maobaolong in #2713
  • Revert "[MP] Support MP Server restart (#2713)" by @ApostaC in #2729
  • [MP][UX][Docs] Enhance http server and its docs for MP mode by @ApostaC in #2722
  • [MP] Update the MP docs and pass telemetry config into http_server by @ApostaC in #2730

New Contributors

Full Changelog: v0.3.15...v0.4.0

Operator Nightly Latest (nightly-20260510-d945fbb)

07 Mar 00:18
f636089

Choose a tag to compare

Automated nightly operator build from dev branch.

Image: lmcache/lmcache-operator:nightly-20260510-d945fbb

kubectl apply -f https://github.com/LMCache/LMCache/releases/download/operator-nightly-latest/install.yaml

v0.3.15

02 Mar 22:08
9dc3ab7

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.3.14...v0.3.15