feat(tools): add LRU cache simulator for lookup-hash JSONL logs#3021
feat(tools): add LRU cache simulator for lookup-hash JSONL logs#3021ApostaC merged 16 commits intoLMCache:devfrom
Conversation
Adds lmcache/tools/cache_simulator/ with four modules:
- lru_cache.py — LRUCacheFast (O(1)) and LRUCache (O(log n) with
position tracking) backed by OrderedDict / SortedList
- simulator.py — load_lookup_events(), simulate(), print_statistics(),
plot_statistics(), and a CLI
- plot_hit_rate.py — capacity sweep over log-spaced GiB range + matplotlib
plot
- README.md — user and developer documentation
The primary metric is *token* cache hit rate:
hit_tokens / total_tokens
where hit_tokens = hit_prefix_chunks × chunk_size and tail tokens
(seq_len mod chunk_size) are always counted as misses, matching the
LMCache server's semantics. Cache capacity is expressed in bytes; the
CLI accepts GiB and auto-computes bytes-per-chunk from shapes/dtypes in
the first event.
Running simulator.py prints a full text report and saves a 7-panel
statistics PNG (per-request hit rate, hit prefix length, chunk reuse
count, rolling hit rate, input length, global span, cache position).
Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: crclq2018 <crclq2018@gmail.com>
Signed-off-by: rigginschen <rigginschen@tencent.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a cache simulator tool for LMCache, featuring LRU cache implementations, a simulation engine, and utilities for plotting token hit rates against cache capacity. The review feedback highlights several compliance and quality issues: the lack of unit or integration tests for the new feature, missing docstrings for public CLI entry points in violation of the project style guide, and a recommendation to use isinstance for better type safety in the simulation logic.
| @@ -0,0 +1,124 @@ | |||
| # SPDX-License-Identifier: Apache-2.0 | |||
There was a problem hiding this comment.
Thanks for the contribution! This is very useful!
Usage-wise, can you put it under lmcache cli, so that we can run
lmcache tool cache_simulator
instead of using
python -m lmcache.tools.cache_simulator.simulator?
There was a problem hiding this comment.
Done. Now we can use it like this:
# 1. Collect logs from a live server (see Step 1 below)
lmcache server --lookup-hash-log-dir /data/lmcache/lookup_hashes ...
# 2. Simulate at a fixed capacity — prints text report and saves a PNG chart
lmcache tool cache-simulator simulate \
-i /data/lmcache/lookup_hashes \
--cache-capacity-gib 64 \
-o stats.png
# 3. Sweep across capacities to find the right cache size
lmcache tool cache-simulator sweep \
-i /data/lmcache/lookup_hashes \
--min-capacity-gib 1 \
--max-capacity-gib 512 \
--points 30 \
-o sweep.pngIntegrates the cache simulator into the lmcache CLI so users can run:
lmcache tool cache-simulator simulate -i <logs> --cache-capacity-gib 64
lmcache tool cache-simulator sweep -i <logs> --min-capacity-gib 1 --max-capacity-gib 512
`simulate` replays lookup-hash JSONL logs at a fixed cache capacity,
prints a text report, and saves a 7-panel statistics PNG.
`sweep` scans a log-spaced range of capacities and saves a hit-rate
vs capacity PNG.
The python -m lmcache.tools.cache_simulator.{simulator,plot_hit_rate}
entry points continue to work unchanged.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: crclq2018 <crclq2018@gmail.com>
Signed-off-by: kumaneko <crclq2018@gmail.com>
…cation Adds four public functions that serve as the single source of truth for CLI flag definitions and execution logic: simulator.py: add_simulate_arguments(parser), run_simulate(args) plot_hit_rate.py: add_sweep_arguments(parser), run_sweep(args) ToolCommand now calls these instead of duplicating the flags itself. Adding or removing a flag in the simulator modules automatically takes effect in both `python -m ...` and `lmcache tool cache-simulator ...`. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: crclq2018 <crclq2018@gmail.com> Signed-off-by: kumaneko <crclq2018@gmail.com>
ToolCommand in __init__.py is now a thin dispatcher (~60 lines). Cache-simulator wiring moves to tool/cache_simulator.py. Adding a future tool requires only: 1. Create tool/<new_tool>.py with a register() function 2. Import it in __init__.py and call new_tool.register(inner) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: crclq2018 <crclq2018@gmail.com> Signed-off-by: kumaneko <crclq2018@gmail.com>
Adds a "CLI integration" subsection under "For Developers" that: - shows the tool/ package layout alongside the simulator package - explains that add_*_arguments/run_* are the single source of truth - tells developers where to edit when adding a flag vs a new action Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: crclq2018 <crclq2018@gmail.com> Signed-off-by: kumaneko <crclq2018@gmail.com>
Replaces all python3 -m ... invocations in Quick Start, Step 2, Step 3, and CLI Reference with lmcache tool cache-simulator simulate/sweep. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: crclq2018 <crclq2018@gmail.com> Signed-off-by: kumaneko <crclq2018@gmail.com>
…ulator * simulator.py: replace hasattr(cache, "position") + type: ignore with isinstance(cache, LRUCache) for proper type narrowing * tests/tools/test_cache_simulator.py: 26 unit tests covering LRUCacheFast, LRUCache, compute_kv_bytes_per_chunk, load_lookup_events, and simulate (including prefix semantics, tail-token misses, eviction, and fast vs normal mode parity) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: crclq2018 <crclq2018@gmail.com> Signed-off-by: kumaneko <crclq2018@gmail.com>
The cache simulator tool (lmcache tool cache-simulator) uses matplotlib for generating PNG charts. Declaring it in cli.txt ensures it is always available when the lmcache CLI is installed, avoiding ImportError on any lmcache invocation. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: crclq2018 <crclq2018@gmail.com> Signed-off-by: kumaneko <crclq2018@gmail.com>
Without this file setuptools.find_packages() does not discover lmcache.tools or its sub-packages, causing ImportError on a standard (non-editable) pip install. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: crclq2018 <crclq2018@gmail.com> Signed-off-by: kumaneko <crclq2018@gmail.com>
Adds simulate_example.png and sweep_example.png under docs/ and references them in Step 2 and Step 3 of the README, as requested by ApostaC in the PR review. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: crclq2018 <crclq2018@gmail.com> Signed-off-by: kumaneko <crclq2018@gmail.com>
|
Screenshots added to README.md |
…serve dataset Add gen_bench_dataset.py which converts LMCache lookup-hash JSONL logs into a vllm bench serve custom dataset (JSONL with "prompt" and "output_tokens" fields). The conversion preserves prefix-sharing structure: requests that shared a chunk hash in the original logs will share the same token prefix in the synthetic prompts, so LMCache prefix caching sees the same hit/miss pattern during replay. Algorithm: build a stable safe vocabulary from the tokenizer (tokens that round-trip through encode/decode cleanly), then deterministically map each chunk hash to chunk_size token IDs via SHA-256 seeded RNG. Also wire "gen-dataset" as a new sub-action of `lmcache tool cache-simulator` and update the README with Step 4. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: kumaneko <crclq2018@gmail.com>
The dataset generation step is not required for the core cache simulator workflow. Keep the gen-dataset command available but remove it from the main README flow (Table of Contents, Quick Start, and step sections). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: kumaneko <crclq2018@gmail.com>
c644b38 to
3e1a886
Compare
matplotlib is only needed when actually plotting (plot_statistics / run_sweep). Moving the import inside those functions lets the module be imported — and all unit tests collected — without matplotlib installed, fixing the CI "1 error" collection failure. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: kumaneko <crclq2018@gmail.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 0d82a9f. Configure here.
KuntaiDu
left a comment
There was a problem hiding this comment.
Design doc is missing, but functionality-wise LGTM
…che#3021) * Add LRU cache simulator for lookup-hash JSONL logs * Adds lmcache/tools/cache_simulator/ with four modules: Signed-off-by: crclq2018 <crclq2018@gmail.com> Signed-off-by: rigginschen <rigginschen@tencent.com> Signed-off-by: kumaneko <crclq2018@gmail.com> Co-authored-by: rigginschen <rigginschen@tencent.com>

Adds lmcache/tools/cache_simulator/ with four modules:
position tracking) backed by OrderedDict / SortedList
plot_statistics(), and a CLI
plot
Quick Start
The primary metric is token cache hit rate:
where hit_tokens = hit_prefix_chunks × chunk_size. Tail tokens (seq_len mod chunk_size) are always counted as misses. Cache capacity is expressed in bytes; the CLI accepts GiB and auto-computes bytes-per-chunk from shapes/dtypes in the first event.
Running simulator.py prints a full text report and saves a 8-panel statistics PNG (per-request hit rate, hit prefix length, chunk reuse count, rolling hit rate, input length, global span, cache position).
Example diagrams.
You can use the following chunk hashes data to plot the diagrams above.
https://drive.google.com/file/d/18jxUlI_J9sT_Mis3nft0nYEudwvJJ9UI/view?usp=drive_link
Note
Medium Risk
Introduces new CLI entrypoints and new optional runtime dependencies (e.g.,
matplotlib,sortedcontainers,transformers) that may affect packaging/CLI startup if dependency sets are misconfigured. Core server/runtime logic is otherwise untouched, so functional risk is mostly limited to the new tool surface area.Overview
Adds a new
lmcache toolcommand group, includinglmcache tool cache-simulator {simulate,sweep,gen-dataset}for offline analysis of lookup-hash JSONL logs.Introduces a cache-simulator implementation under
lmcache/tools/cache_simulator/with LRU cache models, log loading, token hit-rate simulation (including prefix-hit semantics and tail-token misses), reporting/plotting, and a dataset generator forvllm bench serve.Updates CLI dependency set (
requirements/cli.txt) to includematplotlib, and adds a focused test suite covering the LRU caches, event loading, and simulation correctness.Reviewed by Cursor Bugbot for commit d7adca9. Bugbot is set up for automated code reviews on this repo. Configure here.