feat(tools): add LRU cache simulator for lookup-hash JSONL logs by yoo-kumaneko · Pull Request #3021 · LMCache/LMCache

yoo-kumaneko · 2026-04-13T15:12:04Z

Adds lmcache/tools/cache_simulator/ with four modules:

lru_cache.py — LRUCacheFast (O(1)) and LRUCache (O(log n) with
position tracking) backed by OrderedDict / SortedList
simulator.py — load_lookup_events(), simulate(), print_statistics(),
plot_statistics(), and a CLI
plot_hit_rate.py — capacity sweep over log-spaced GiB range + matplotlib
plot
README.md — user and developer documentation

Quick Start

# 1. Collect logs from a live server
lmcache server --lookup-hash-log-dir /data/lmcache/lookup_hashes ...

# 2. Simulate at a fixed capacity — prints text report and saves a PNG chart
python3 -m lmcache.tools.cache_simulator.simulator \
    -i /data/lmcache/lookup_hashes \
    --cache-capacity-gib 64 \
    -o stats.png

# 3. Sweep across capacities to find the right cache size
python3 -m lmcache.tools.cache_simulator.plot_hit_rate \
    -i /data/lmcache/lookup_hashes \
    --min-capacity-gib 1 \
    --max-capacity-gib 512 \
    --points 30 \
    -o sweep.png

The primary metric is token cache hit rate:

hit_tokens / total_tokens

where hit_tokens = hit_prefix_chunks × chunk_size. Tail tokens (seq_len mod chunk_size) are always counted as misses. Cache capacity is expressed in bytes; the CLI accepts GiB and auto-computes bytes-per-chunk from shapes/dtypes in the first event.

Running simulator.py prints a full text report and saves a 8-panel statistics PNG (per-request hit rate, hit prefix length, chunk reuse count, rolling hit rate, input length, global span, cache position).

Example diagrams.

You can use the following chunk hashes data to plot the diagrams above.
https://drive.google.com/file/d/18jxUlI_J9sT_Mis3nft0nYEudwvJJ9UI/view?usp=drive_link

Note

Medium Risk
Introduces new CLI entrypoints and new optional runtime dependencies (e.g., matplotlib, sortedcontainers, transformers) that may affect packaging/CLI startup if dependency sets are misconfigured. Core server/runtime logic is otherwise untouched, so functional risk is mostly limited to the new tool surface area.

Overview
Adds a new lmcache tool command group, including lmcache tool cache-simulator {simulate,sweep,gen-dataset} for offline analysis of lookup-hash JSONL logs.

Introduces a cache-simulator implementation under lmcache/tools/cache_simulator/ with LRU cache models, log loading, token hit-rate simulation (including prefix-hit semantics and tail-token misses), reporting/plotting, and a dataset generator for vllm bench serve.

Updates CLI dependency set (requirements/cli.txt) to include matplotlib, and adds a focused test suite covering the LRU caches, event loading, and simulation correctness.

^{Reviewed by Cursor Bugbot for commit d7adca9. Bugbot is set up for automated code reviews on this repo. Configure here.}

Adds lmcache/tools/cache_simulator/ with four modules: - lru_cache.py — LRUCacheFast (O(1)) and LRUCache (O(log n) with position tracking) backed by OrderedDict / SortedList - simulator.py — load_lookup_events(), simulate(), print_statistics(), plot_statistics(), and a CLI - plot_hit_rate.py — capacity sweep over log-spaced GiB range + matplotlib plot - README.md — user and developer documentation The primary metric is *token* cache hit rate: hit_tokens / total_tokens where hit_tokens = hit_prefix_chunks × chunk_size and tail tokens (seq_len mod chunk_size) are always counted as misses, matching the LMCache server's semantics. Cache capacity is expressed in bytes; the CLI accepts GiB and auto-computes bytes-per-chunk from shapes/dtypes in the first event. Running simulator.py prints a full text report and saves a 7-panel statistics PNG (per-request hit rate, hit prefix length, chunk reuse count, rolling hit rate, input length, global span, cache position). Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: crclq2018 <crclq2018@gmail.com> Signed-off-by: rigginschen <rigginschen@tencent.com>

gemini-code-assist

Code Review

This pull request introduces a cache simulator tool for LMCache, featuring LRU cache implementations, a simulation engine, and utilities for plotting token hit rates against cache capacity. The review feedback highlights several compliance and quality issues: the lack of unit or integration tests for the new feature, missing docstrings for public CLI entry points in violation of the project style guide, and a recommendation to use isinstance for better type safety in the simulation logic.

ApostaC

LGTM! It would be more helpful to put a few screenshots of the results into the README.md to give users a better understanding of the expected outcome.

KuntaiDu · 2026-04-13T18:42:20Z

@@ -0,0 +1,124 @@
+# SPDX-License-Identifier: Apache-2.0


Thanks for the contribution! This is very useful!
Usage-wise, can you put it under lmcache cli, so that we can run
lmcache tool cache_simulator
instead of using
python -m lmcache.tools.cache_simulator.simulator?

Done. Now we can use it like this:

# 1. Collect logs from a live server (see Step 1 below) lmcache server --lookup-hash-log-dir /data/lmcache/lookup_hashes ... # 2. Simulate at a fixed capacity — prints text report and saves a PNG chart lmcache tool cache-simulator simulate \ -i /data/lmcache/lookup_hashes \ --cache-capacity-gib 64 \ -o stats.png # 3. Sweep across capacities to find the right cache size lmcache tool cache-simulator sweep \ -i /data/lmcache/lookup_hashes \ --min-capacity-gib 1 \ --max-capacity-gib 512 \ --points 30 \ -o sweep.png

Integrates the cache simulator into the lmcache CLI so users can run: lmcache tool cache-simulator simulate -i <logs> --cache-capacity-gib 64 lmcache tool cache-simulator sweep -i <logs> --min-capacity-gib 1 --max-capacity-gib 512 `simulate` replays lookup-hash JSONL logs at a fixed cache capacity, prints a text report, and saves a 7-panel statistics PNG. `sweep` scans a log-spaced range of capacities and saves a hit-rate vs capacity PNG. The python -m lmcache.tools.cache_simulator.{simulator,plot_hit_rate} entry points continue to work unchanged. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: crclq2018 <crclq2018@gmail.com> Signed-off-by: kumaneko <crclq2018@gmail.com>

…cation Adds four public functions that serve as the single source of truth for CLI flag definitions and execution logic: simulator.py: add_simulate_arguments(parser), run_simulate(args) plot_hit_rate.py: add_sweep_arguments(parser), run_sweep(args) ToolCommand now calls these instead of duplicating the flags itself. Adding or removing a flag in the simulator modules automatically takes effect in both `python -m ...` and `lmcache tool cache-simulator ...`. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: crclq2018 <crclq2018@gmail.com> Signed-off-by: kumaneko <crclq2018@gmail.com>

ToolCommand in __init__.py is now a thin dispatcher (~60 lines). Cache-simulator wiring moves to tool/cache_simulator.py. Adding a future tool requires only: 1. Create tool/<new_tool>.py with a register() function 2. Import it in __init__.py and call new_tool.register(inner) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: crclq2018 <crclq2018@gmail.com> Signed-off-by: kumaneko <crclq2018@gmail.com>

Adds a "CLI integration" subsection under "For Developers" that: - shows the tool/ package layout alongside the simulator package - explains that add_*_arguments/run_* are the single source of truth - tells developers where to edit when adding a flag vs a new action Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: crclq2018 <crclq2018@gmail.com> Signed-off-by: kumaneko <crclq2018@gmail.com>

Replaces all python3 -m ... invocations in Quick Start, Step 2, Step 3, and CLI Reference with lmcache tool cache-simulator simulate/sweep. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: crclq2018 <crclq2018@gmail.com> Signed-off-by: kumaneko <crclq2018@gmail.com>

…ulator * simulator.py: replace hasattr(cache, "position") + type: ignore with isinstance(cache, LRUCache) for proper type narrowing * tests/tools/test_cache_simulator.py: 26 unit tests covering LRUCacheFast, LRUCache, compute_kv_bytes_per_chunk, load_lookup_events, and simulate (including prefix semantics, tail-token misses, eviction, and fast vs normal mode parity) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: crclq2018 <crclq2018@gmail.com> Signed-off-by: kumaneko <crclq2018@gmail.com>

The cache simulator tool (lmcache tool cache-simulator) uses matplotlib for generating PNG charts. Declaring it in cli.txt ensures it is always available when the lmcache CLI is installed, avoiding ImportError on any lmcache invocation. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: crclq2018 <crclq2018@gmail.com> Signed-off-by: kumaneko <crclq2018@gmail.com>

Without this file setuptools.find_packages() does not discover lmcache.tools or its sub-packages, causing ImportError on a standard (non-editable) pip install. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: crclq2018 <crclq2018@gmail.com> Signed-off-by: kumaneko <crclq2018@gmail.com>

Adds simulate_example.png and sweep_example.png under docs/ and references them in Step 2 and Step 3 of the README, as requested by ApostaC in the PR review. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: crclq2018 <crclq2018@gmail.com> Signed-off-by: kumaneko <crclq2018@gmail.com>

yoo-kumaneko · 2026-04-14T07:43:37Z

Screenshots added to README.md

…serve dataset Add gen_bench_dataset.py which converts LMCache lookup-hash JSONL logs into a vllm bench serve custom dataset (JSONL with "prompt" and "output_tokens" fields). The conversion preserves prefix-sharing structure: requests that shared a chunk hash in the original logs will share the same token prefix in the synthetic prompts, so LMCache prefix caching sees the same hit/miss pattern during replay. Algorithm: build a stable safe vocabulary from the tokenizer (tokens that round-trip through encode/decode cleanly), then deterministically map each chunk hash to chunk_size token IDs via SHA-256 seeded RNG. Also wire "gen-dataset" as a new sub-action of `lmcache tool cache-simulator` and update the README with Step 4. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: kumaneko <crclq2018@gmail.com>

The dataset generation step is not required for the core cache simulator workflow. Keep the gen-dataset command available but remove it from the main README flow (Table of Contents, Quick Start, and step sections). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: kumaneko <crclq2018@gmail.com>

matplotlib is only needed when actually plotting (plot_statistics / run_sweep). Moving the import inside those functions lets the module be imported — and all unit tests collected — without matplotlib installed, fixing the CI "1 error" collection failure. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: kumaneko <crclq2018@gmail.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 0d82a9f. Configure here.}

KuntaiDu

Design doc is missing, but functionality-wise LGTM

…che#3021) * Add LRU cache simulator for lookup-hash JSONL logs * Adds lmcache/tools/cache_simulator/ with four modules: Signed-off-by: crclq2018 <crclq2018@gmail.com> Signed-off-by: rigginschen <rigginschen@tencent.com> Signed-off-by: kumaneko <crclq2018@gmail.com> Co-authored-by: rigginschen <rigginschen@tencent.com>

gemini-code-assist Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread lmcache/tools/cache_simulator/simulator.py

Comment thread lmcache/tools/cache_simulator/simulator.py Outdated

Comment thread lmcache/tools/cache_simulator/simulator.py Outdated

Comment thread lmcache/tools/cache_simulator/plot_hit_rate.py Outdated

cursor Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread lmcache/tools/cache_simulator/simulator.py

Comment thread lmcache/tools/cache_simulator/simulator.py

ApostaC approved these changes Apr 13, 2026

View reviewed changes

KuntaiDu reviewed Apr 13, 2026

View reviewed changes

yoo-kumaneko requested review from deng451e, royyhuang and sammshen as code owners April 14, 2026 06:11

cursor Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread lmcache/cli/commands/tool/__init__.py Outdated

cursor Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread lmcache/tools/cache_simulator/simulator.py Outdated

Comment thread lmcache/tools/cache_simulator/simulator.py

cursor Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread lmcache/cli/commands/tool/cache_simulator.py

yoo-kumaneko and others added 3 commits April 14, 2026 14:58

yoo-kumaneko requested a review from hickeyma as a code owner April 14, 2026 07:19

yoo-kumaneko requested a review from KuntaiDu April 14, 2026 07:22

cursor Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread lmcache/tools/cache_simulator/__init__.py

yoo-kumaneko and others added 3 commits April 14, 2026 15:33

cursor Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread lmcache/tools/cache_simulator/lru_cache.py

yoo-kumaneko and others added 2 commits April 14, 2026 17:03

yoo-kumaneko force-pushed the feature/cache-simulator branch from c644b38 to 3e1a886 Compare April 14, 2026 10:53

yoo-kumaneko and others added 2 commits April 14, 2026 19:03

Merge branch 'dev' into feature/cache-simulator

2960587

Merge branch 'dev' into feature/cache-simulator

0d82a9f

cursor Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread lmcache/tools/cache_simulator/gen_bench_dataset.py

Comment thread lmcache/tools/cache_simulator/gen_bench_dataset.py

ApostaC enabled auto-merge (squash) April 14, 2026 20:17

github-actions Bot added the full Run comprehensive tests on this PR label Apr 14, 2026

KuntaiDu approved these changes Apr 14, 2026

View reviewed changes

Merge branch 'dev' into feature/cache-simulator

d7adca9

ApostaC merged commit e64b6e3 into LMCache:dev Apr 15, 2026
36 of 38 checks passed

Conversation

yoo-kumaneko commented Apr 13, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Quick Start

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ApostaC left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KuntaiDu Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

yoo-kumaneko Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yoo-kumaneko commented Apr 14, 2026

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

KuntaiDu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yoo-kumaneko commented Apr 13, 2026 •

edited by cursor Bot

Loading

ApostaC left a comment •

edited

Loading

yoo-kumaneko Apr 14, 2026 •

edited

Loading